Frequent “Checkpoint not complete” messages in your Oracle alert log are more than just a nuisance — they’re a clear signal that your redo log files are undersized. This can severely affect database performance, especially under heavy workloads.
In this post, we’ll understand why this happens, how to detect it, and how to determine the optimal redo log size to keep your Oracle database running smoothly.
Understanding the Issue
Redo logs play a crucial role in Oracle’s architecture — they record all changes made to the database, ensuring recovery in case of a failure.
However, if redo log files are too small, they fill up quickly, forcing Oracle to perform frequent log switches and checkpoints. When the database can’t complete one checkpoint before the next one starts, you’ll see messages like:
Checkpoint not complete
This means the Log Writer (LGWR) is unable to keep up with the Database Writer (DBWR) and checkpoint activity, leading to performance bottlenecks.
⚠️ Signs of Undersized Redo Log Files
Here are the typical indicators that your redo log files are too small:
1. Frequent “Checkpoint Not Complete” Messages
If you see two or more of these messages per hour in your alert log, it indicates that the redo logs are filling up faster than the checkpoint process can handle.
2. High “Log File Sync” Wait Events
The wait event log file sync appears when user sessions wait for LGWR to write commit records to disk.
A large number of such waits often points to small redo logs that cause too many log switches.
3. Excessive Log Switching
If your database performs multiple log switches per hour, it increases system overhead and impacts performance.
Larger redo logs reduce this frequency, stabilizing I/O operations and overall throughput.
How to Determine the Optimal Redo Log Size
Oracle provides a data-driven way to find the ideal redo log size using the FAST_START_MTTR_TARGET parameter and the V$INSTANCE_RECOVERY view.
Step 1: Enable FAST_START_MTTR_TARGET
This parameter defines the Mean Time To Recover (MTTR) — the target duration (in seconds) for crash recovery.
Set it to a reasonable value (for example, 60 seconds):
ALTER SYSTEM SET FAST_START_MTTR_TARGET = 60;
This enables Oracle’s automatic tuning mechanism to determine the optimal redo log size based on your workload.
Step 2: Check Current Redo Log Sizes
Before making any changes, it’s important to know the current size of your redo log files.
Run the following query to check existing redo log groups and their sizes:
SELECT
l.group#,
l.thread#,
l.bytes/1024/1024 AS size_mb,
l.members,
lf.member AS logfile_name
FROM
v$log l
JOIN v$logfile lf ON l.group# = lf.group#
ORDER BY
l.group#;
This query shows each redo log group’s size in MB, the number of members (mirrored files), and their file names.
If you notice sizes like 50MB, 100MB, or 200MB on busy databases, these are often too small for modern workloads.
Step 3: Run a Typical Workload
Let your database run under normal workload conditions for some time.
This allows Oracle’s background processes to collect sufficient data about redo generation and checkpoint performance.
Step 4: Query V$INSTANCE_RECOVERY for Recommended Size
Now, query the following view to find the optimal redo log file size:
SELECT optimal_logfile_size
FROM v$instance_recovery;
This will return the recommended redo log file size (in megabytes), based on your system’s recovery performance and workload.
⚙️ Step 5: Resize or Add New Redo Log Files
Once you know the optimal size, you can create new redo log groups with the recommended size.
For example, if the query suggests 1 GB, you can add new log groups like this:
ALTER DATABASE ADD LOGFILE GROUP 4 (‘/u01/oradata/redo04.log’) SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 5 (‘/u01/oradata/redo05.log’) SIZE 1G;
Then, drop the smaller old log groups after confirming the new ones are in use:
ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 2;
Ensure you do not drop the current or active log group.)
💡 Best Practices
Maintain at least 3 redo log groups to ensure smooth switching. Keep redo logs on fast storage (preferably SSD or ASM). Monitor alert logs regularly for “checkpoint not complete” messages. Periodically check V$LOG_HISTORY to analyze switch frequency. Re-evaluate redo size after significant workload changes.
✅ Conclusion
Frequent “checkpoint not complete” messages are a performance red flag — usually caused by undersized redo logs.
By using FAST_START_MTTR_TARGET, checking current log sizes, and analyzing V$INSTANCE_RECOVERY, you can determine and configure the optimal redo log size for your environment.
With properly sized redo logs, you’ll notice:
Fewer checkpoints Reduced log file sync waits Smoother overall performance
Good one!!