Tag Archives: AWS

4. Oracle Clusterware – Heart of Oracle RAC

Oracle Clusterware is the backbone of Oracle Real Application Clusters (RAC), managing the nodes, resources, and providing high availability across multiple database instances. In this chapter, we will explore the critical components of Oracle Clusterware, covering configuration, management, troubleshooting, and the essential commands like crsctl and srvctl. Each section is accompanied by examples and the necessary commands to manage Clusterware effectively.

1. Oracle Clusterware: Heart of Oracle RAC

Oracle Clusterware is essential for managing and enabling high availability within Oracle RAC environments. It coordinates all the components of RAC, ensuring node communication, cluster membership, failover, and load balancing. Without Clusterware, Oracle RAC nodes cannot function cohesively as a single database.

Key functions provided by Oracle Clusterware:

  • Node Management: Ensures each node’s participation in the cluster.
  • Resource Management: Manages resources like databases, listeners, services, and applications across the cluster.
  • Failover and Recovery: Provides automatic failover for resources in case of node failure.

2. Overview of Oracle Clusterware

Oracle Clusterware consists of several important components that enable it to manage resources across nodes efficiently. These components include the Voting Disk, Oracle Cluster Registry (OCR), and several background processes.

Key Components of Oracle Clusterware:

  1. Oracle Cluster Registry (OCR):
    • Stores cluster configuration information such as node list, services, and cluster resources.
  2. Voting Disk:
    • Keeps track of active nodes in the cluster and helps resolve split-brain situations (when cluster nodes lose contact with each other).
  3. Cluster Synchronization Services (CSS):
    • Synchronizes cluster node activities and membership.
  4. Cluster Ready Services (CRS):
    • Manages cluster resources and makes sure they are running as expected.

Commands to Check Clusterware Status:

# Check Clusterware status on all nodes
crsctl check crs

# Check the status of the cluster
crsctl check cluster

# Check the health of cluster resources
crsctl status resource -t

3. Configuring and Managing Oracle Clusterware

The management of Oracle Clusterware revolves around configuring voting disks, managing OCR, and ensuring the health of cluster nodes. It also involves configuring network settings and setting up the shared storage that Oracle RAC relies upon.

Managing Voting Disks:

The voting disk helps determine the cluster’s active nodes. If the majority of nodes are unable to access the voting disk, the cluster shuts down to prevent split-brain issues.

Check the current Voting Disk configuration:

crsctl query css votedisk

Add a new Voting Disk (using ASM disk groups):

crsctl replace votedisk +DATA

Remove a Voting Disk:

crsctl delete votedisk +OLD_DISK

Managing Oracle Cluster Registry (OCR):

OCR is a repository that holds configuration information about cluster resources. Keeping OCR backups is essential for recovery in case of corruption.

Check the OCR integrity:

ocrcheck

Backup OCR manually:

ocrconfig backup

List OCR backup files:

ocrconfig showbackup\

Restore OCR from a backup:

ocrconfig restore /path_to_backup/ocrbackup_file

Managing Cluster Resources:

Resources like databases, services, and listeners are all managed by Oracle Clusterware. These can be started, stopped, or checked using the crsctl or srvctl commands.

  • Check the status of cluster resources:
crsctl status resource -t

Stop a cluster resource (e.g., RAC Database):

crsctl stop resource ora.rac.db

Start a cluster resource:

crsctl start resource ora.rac.db

4. Understanding Voting Disk and OCR (Oracle Cluster Registry)

Voting Disk:

The Voting Disk is a shared storage component that helps the cluster keep track of which nodes are part of the cluster. It is crucial for avoiding split-brain scenarios where nodes become unaware of each other’s existence.

View current voting disks:

crsctl query css votedisk

Add a Voting Disk:

crsctl add votedisk /dev/raw/raw1

Remove a Voting Disk:

crsctl delete votedisk /dev/raw/raw1

Oracle Cluster Registry (OCR):

OCR holds information about the entire cluster configuration, such as which nodes are part of the cluster, what services are running, and resource information.

  • Check OCR configuration:
ocrcheck

Manually backup OCR:

ocrconfig manualbackup

View OCR backup status:

ocrconfig showbackup

5. Troubleshooting Clusterware Issues

Oracle Clusterware issues can be caused by network failures, node failures, disk issues, or misconfigurations. Common troubleshooting commands allow you to check the health of cluster components and services, and to access logs for more detailed analysis.

Common Troubleshooting Steps:

Check the Clusterware log files for errors:

    tail -f /u01/app/grid/diag/crs/node1/crs/trace/alert.log

    Check the status of Clusterware components:

    crsctl check crs

    Check the network interface configuration:

    oifcfg getif

    Check the status of a resource (e.g., database):

    crsctl status resource ora.rac.db -t

    Restarting Clusterware Services:

    In case of critical issues, restarting the Clusterware services may help resolve the issue.

    Stop Clusterware:

    crsctl stop crs

    Start Clusterware:

    crsctl start crs

    Other Troubleshooting Commands:

    • Check network interfaces and status:
    oifcfg getif

    Check cluster health:

    crsctl check cluster

    6. Using crsctl and srvctl to Manage Clusterware

    Oracle provides two main command-line utilities to manage Oracle Clusterware: crsctl (for Clusterware control) and srvctl (for managing Oracle RAC and services). Below are the commonly used commands for managing and controlling cluster resources.

    Using crsctl:

    crsctl is used to manage low-level cluster operations, such as controlling Clusterware processes, querying voting disks, and managing resources.

    • Check the status of Clusterware:
    crsctl check crs

    Stop Clusterware on a node:

    crsctl stop crs

    Start Clusterware on a node:

    crsctl start crs

    List all Clusterware resources:

    crsctl status resource -t

    Using srvctl:

    srvctl is used to manage RAC databases, instances, and services across nodes. It provides commands to start, stop, or check the status of resources such as databases, listeners, and services.

    • Start a RAC database:
    srvctl start database -d racdb

    Stop a RAC database:

    srvctl stop database -d racdb

    Check the status of a RAC database:

    srvctl status database -d racdb

    Manage Oracle Listener services:

    srvctl start listener
    srvctl stop listener
    srvctl status listener

    Manage Oracle RAC Services:

    srvctl start service -d racdb -s myservice
    srvctl stop service -d racdb -s myservice

    Conclusion

    In this chapter, we have explored the heart of Oracle RAC—Oracle Clusterware. We covered the fundamental components of Clusterware, such as the Voting Disk and OCR, and how to configure, manage, and troubleshoot them. The chapter also provided commands to monitor, start, stop, and check the health of various cluster resources, using tools like crsctl and srvctl. Understanding how to manage Oracle Clusterware is crucial for maintaining a healthy and high-performing Oracle RAC environment.