How to Fix Frequent “Checkpoint Not Complete” Messages by Increasing Redo Log Size in Oracle Databa

Frequent “Checkpoint not complete” messages in your Oracle alert log are more than just a nuisance — they’re a clear signal that your redo log files are undersized. This can severely affect database performance, especially under heavy workloads.

In this post, we’ll understand why this happens, how to detect it, and how to determine the optimal redo log size to keep your Oracle database running smoothly.

Understanding the Issue

Redo logs play a crucial role in Oracle’s architecture — they record all changes made to the database, ensuring recovery in case of a failure.

However, if redo log files are too small, they fill up quickly, forcing Oracle to perform frequent log switches and checkpoints. When the database can’t complete one checkpoint before the next one starts, you’ll see messages like:

Checkpoint not complete

This means the Log Writer (LGWR) is unable to keep up with the Database Writer (DBWR) and checkpoint activity, leading to performance bottlenecks.

⚠️ Signs of Undersized Redo Log Files

Here are the typical indicators that your redo log files are too small:

1. Frequent “Checkpoint Not Complete” Messages

If you see two or more of these messages per hour in your alert log, it indicates that the redo logs are filling up faster than the checkpoint process can handle.

2. High “Log File Sync” Wait Events

The wait event log file sync appears when user sessions wait for LGWR to write commit records to disk.

A large number of such waits often points to small redo logs that cause too many log switches.

3. Excessive Log Switching

If your database performs multiple log switches per hour, it increases system overhead and impacts performance.

Larger redo logs reduce this frequency, stabilizing I/O operations and overall throughput.

How to Determine the Optimal Redo Log Size

Oracle provides a data-driven way to find the ideal redo log size using the FAST_START_MTTR_TARGET parameter and the V$INSTANCE_RECOVERY view.

Step 1: Enable FAST_START_MTTR_TARGET

This parameter defines the Mean Time To Recover (MTTR) — the target duration (in seconds) for crash recovery.

Set it to a reasonable value (for example, 60 seconds):

ALTER SYSTEM SET FAST_START_MTTR_TARGET = 60;

This enables Oracle’s automatic tuning mechanism to determine the optimal redo log size based on your workload.

Step 2: Check Current Redo Log Sizes

Before making any changes, it’s important to know the current size of your redo log files.

Run the following query to check existing redo log groups and their sizes:

SELECT
l.group#,
l.thread#,
l.bytes/1024/1024 AS size_mb,
l.members,
lf.member AS logfile_name
FROM
v$log l
JOIN v$logfile lf ON l.group# = lf.group#
ORDER BY
l.group#;

This query shows each redo log group’s size in MB, the number of members (mirrored files), and their file names.

If you notice sizes like 50MB, 100MB, or 200MB on busy databases, these are often too small for modern workloads.

Step 3: Run a Typical Workload

Let your database run under normal workload conditions for some time.

This allows Oracle’s background processes to collect sufficient data about redo generation and checkpoint performance.

Step 4: Query V$INSTANCE_RECOVERY for Recommended Size

Now, query the following view to find the optimal redo log file size:

SELECT optimal_logfile_size
FROM v$instance_recovery;

This will return the recommended redo log file size (in megabytes), based on your system’s recovery performance and workload.

⚙️ Step 5: Resize or Add New Redo Log Files

Once you know the optimal size, you can create new redo log groups with the recommended size.

For example, if the query suggests 1 GB, you can add new log groups like this:

ALTER DATABASE ADD LOGFILE GROUP 4 (‘/u01/oradata/redo04.log’) SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 5 (‘/u01/oradata/redo05.log’) SIZE 1G;

Then, drop the smaller old log groups after confirming the new ones are in use:

ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 2;

Ensure you do not drop the current or active log group.)

💡 Best Practices

Maintain at least 3 redo log groups to ensure smooth switching. Keep redo logs on fast storage (preferably SSD or ASM). Monitor alert logs regularly for “checkpoint not complete” messages. Periodically check V$LOG_HISTORY to analyze switch frequency. Re-evaluate redo size after significant workload changes.

✅ Conclusion

Frequent “checkpoint not complete” messages are a performance red flag — usually caused by undersized redo logs.

By using FAST_START_MTTR_TARGET, checking current log sizes, and analyzing V$INSTANCE_RECOVERY, you can determine and configure the optimal redo log size for your environment.

With properly sized redo logs, you’ll notice:

Fewer checkpoints Reduced log file sync waits Smoother overall performance

SmartTechWays – Innovative Solutions for Smart Businesses

SmartTechWays: Your Hub for Oracle, SQL Server, MySQL, DevOps & AWS Insights

Like this:

Related

1 thought on “How to Fix Frequent “Checkpoint Not Complete” Messages by Increasing Redo Log Size in Oracle Databa”

Leave a ReplyCancel reply

Share this:

Like this:

Related

1 thought on “How to Fix Frequent “Checkpoint Not Complete” Messages by Increasing Redo Log Size in Oracle Databa”

Leave a ReplyCancel reply

Discover more from SmartTechWays - Innovative Solutions for Smart Businesses