Hubbry Logo
Oracle RACOracle RACMain
Open search
Oracle RAC
Community hub
Oracle RAC
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Oracle RAC
Oracle RAC
from Wikipedia

Oracle Real Application Clusters (RAC) is an option[1] for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i. It provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.[2]

Functionality

[edit]

Oracle RAC allows multiple computers to run Oracle RDBMS software simultaneously while accessing a single database, thus providing clustering.

In a non-RAC Oracle database, a single instance accesses a single database. The database consists of a collection of data files, control files, and redo logs located on disk. The instance comprises the collection of Oracle-related memory and background processes that run on a computer system.

In an Oracle RAC environment, 2 or more instances concurrently access a single database. This allows an application or user to connect to either computer and have access to a single coordinated set of data. The instances are connected with each other through an "Interconnect" which enables all the instances to be in sync in accessing the data.

Aims

[edit]

The main aim of Oracle RAC is to implement a clustered database to provide performance, scalability and resilience & high availability of data at instance level.

Implementation

[edit]

Oracle RAC depends on the infrastructure component Oracle Clusterware to coordinate multiple servers and their sharing of data storage.[3] The FAN (Fast Application Notification) technology detects down-states.[4] RAC administrators can use the srvctl tool to manage RAC configurations,[5]

Cache Fusion

[edit]

Prior to Oracle 9, network-clustered Oracle databases used a storage device as the data-transfer medium (meaning that one node would write a data block to disk and another node would read that data from the same disk), which had the inherent disadvantage of lackluster performance. Oracle 9i addressed this issue: RAC uses a dedicated network connection for communications internal to the cluster.

Since all computers/instances in a RAC access the same database, the overall system must guarantee the coordination of data changes on different computers such that whenever a computer queries data, it receives the current version — even if another computer recently modified that data. Oracle RAC refers to this functionality as Cache Fusion. Cache Fusion involves the ability of Oracle RAC to "fuse" the in-memory data cached physically separately on each computer into a single, global cache.

Networking

[edit]

The Oracle Grid Naming Service (GNS) handles name resolution in the cluster registry.[6]

Diagnostics

[edit]

The Trace File Analyzer (TFA) aids in collecting RAC diagnostic data.[7]

Versions

[edit]
  • Oracle Real Application Clusters 12c Release 1 Enterprise Edition.[8]
  • Oracle Real Application Clusters One Node (RAC One Node) applies RAC to single-node installations running Oracle Database 11g Release 2 Enterprise Edition.[9]

Evolution

[edit]

Relative to the single-instance Oracle database, Oracle RAC adds additional complexity. While database automation makes sense for single-instance databases, it becomes even more necessary for clustered databases because of their increased complexity.

Oracle Real Application Clusters (RAC), introduced with Oracle 9i in 2001, supersedes the Oracle Parallel Server (OPS) database option. Whereas Oracle9i required an external clusterware (known as vendor clusterware like TruCluster Veritas Cluster Server or Sun Cluster) for most of the Unix flavors (except for Linux and Windows where Oracle provided free clusterware called Cluster Ready Services or CRS), as of Oracle 10g, Oracle's clusterware product was available for all operating systems. With the release of Oracle Database 10g Release 2 (10.2), Cluster Ready Services was renamed to Oracle Clusterware. When using Oracle 10g or higher, Oracle Clusterware is the only clusterware that you need for most platforms on which Oracle RAC operates (except for Tru cluster, in which case you need vendor clusterware). You can still use clusterware from other vendors, if the clusterware is certified for Oracle RAC.

In RAC, the write-transaction must take ownership of the relevant area of the database: typically, this involves a request across the cluster interconnection (local IP network) to transfer the data-block ownership from another node to the one wishing to do the write. This takes a relatively long time (from a few to tens of milliseconds) compared to single database-node using in-memory operations. For multiple types of applications, the time spent coordinating block access across systems is low relative to the multiple operations on the system, and RAC will scale comparably to a single system.[citation needed] Moreover, high read-transactional databases (such as data-warehousing applications) work very well under RAC, as no need for ownership-transfer exists. (Oracle 11g has made a number of enhancements in this area and performs a lot better than earlier versions for read-only workloads.[citation needed])

The overhead on the resource mastering (or ownership-transfer) is minimal for fewer than three nodes, as the request for any resource in the cluster can be obtained in a maximum of three hops (owner-master-requestor).[citation needed] This makes Oracle RAC horizontally scalable with a number of nodes. Application vendors (such as SAP) use Oracle RAC to demonstrate the scalability of their application. Most of the biggest OLTP benchmarks are on Oracle RAC. Oracle RAC 11g supports up to 100 nodes.[10]

For some[which?] applications, RAC may require careful application partitioning to enhance performance. An application that scales linearly on an SMP machine may scale linearly under RAC. However, if the application cannot scale linearly on SMP, it will not scale when ported to RAC. In short, the application scalability is based on how well the application scales in a single instance.

Competitive context

[edit]

Shared-nothing and shared-everything architectures each have advantages over the other. DBMS vendors and industry analysts regularly debate the matter; for example, Microsoft touts a comparison of its SQL Server 2005 with Oracle 10g RAC.[11]

Oracle Corporation offered a Shared Nothing architecture RDBMS with the advent of the IBM SP and SP2 with the release of 7.x MPP editions, in which virtual shared drives (VSD) were used to create a Shared Everything implementation on a Shared Nothing architecture.

Shared-Everything

[edit]

Shared-everything architectures share both data on disk and data in memory between nodes in the cluster. This is in contrast to "shared-nothing" architectures that share none of them.

Some commercially available databases offer a "shared-everything" architecture. IBM Db2 for z/OS (the IBM mainframe operating-system) has provided a high-performance data-sharing option since the mid-1990s when IBM released its mainframe hardware and software-clustering infrastructure. In late 2009, IBM announced DB2 pureScale, a shared-disk clustering scheme for DB2 9.8 on AIX that mimics the parallel sysplex implementation behind Db2 data sharing on the mainframe.

In February 2008, Sybase released its Adaptive Server Enterprise, Cluster Edition. It resembles Oracle RAC in its shared-everything design.[12]

Although technically not shared-everything, Sybase also provides a column-based relational database focused on analytic and datawarehouse applications called Sybase IQ that can be configured to run in a shared disk mode.

Cloud Native Databases, such as Amazon Aurora and POLARDB of Alibaba Cloud, are implemented with "shared-everything" architecture on top of cloud-based distributed file system.[13][14]

Shared-nothing

[edit]

Shared-nothing architectures share neither the data on disk nor the data in memory between nodes in the cluster. This is in contrast to "shared-everything" architectures, which share both.

Competitive products offering shared-nothing architectures include:

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Oracle Real Application Clusters (RAC) is an option for the that enables multiple server instances to access a single shared database simultaneously, allowing the database to function as a single logical unit while distributing workload across clustered nodes for enhanced and reliability. This architecture contrasts with traditional single-instance Oracle Databases, where only one instance manages the database at a time, by leveraging shared storage and interconnect technology to ensure data consistency and . At its core, Oracle RAC relies on Oracle Clusterware as the foundational infrastructure to manage the cluster, binding multiple servers so they operate as a unified system, along with Oracle Automatic Storage Management (ASM) for efficient shared storage handling. Key mechanisms include the Global Cache Service (GCS) and Global Enqueue Service (GES), which facilitate Cache Fusion to transfer data blocks between instances over a private interconnect, maintaining cache coherency without constant disk access. This setup eliminates single points of failure, supports mission-critical applications without requiring application code modifications, and provides scalability by adding or removing nodes dynamically. Oracle RAC delivers through features like automatic , transparent application continuity, and zero-downtime rolling , protecting against hardware failures, software issues, and planned outages. It scales horizontally for diverse workloads, including (OLTP), analytics, AI, and complex enterprise applications such as , , and , achieving high throughput and ultra-fast response times. Widely adopted in sectors like banking, , retail, and , Oracle RAC ensures 24/7 operation for demanding, always-on environments.

Introduction

Definition and Core Concepts

Oracle Real Application Clusters (RAC) is Oracle's shared-everything clustering solution for the Oracle Database, enabling multiple server instances to operate against a single physical database for enhanced scalability and availability. Introduced in 2001 with Oracle Database 9i Release 1, RAC replaced the prior Oracle Parallel Server (OPS), which depended on disk-based locking mechanisms that limited performance in multi-node environments. This architecture allows a cluster of independent servers, or nodes, to function as a unified system, sharing access to all database resources without partitioning data across nodes. The core structure of Oracle RAC centers on shared storage for critical database components, including data files, control files, and redo logs, which are accessible to every node via cluster-aware file systems or storage area . Each node runs its own database instance, complete with dedicated memory structures such as the System Global Area () for caching data blocks and the Program Global Area (PGA) for session-specific operations, along with background processes for local management. Instances coordinate and synchronize through a private, high-speed interconnect , which facilitates communication for and data consistency. In contrast to a single-instance , where a solitary instance handles all database operations, RAC supports active-active scaling across nodes, distributing workload dynamically to improve throughput and resilience without necessitating application code changes. This setup eliminates single points of failure, as the database remains operational on surviving nodes if one fails, with synchronization primarily managed via the Cache Fusion protocol for efficient block transfers between caches.

Primary Objectives and Benefits

Oracle Real Application Clusters (RAC) primarily aims to deliver high availability by tolerating node failures without interrupting database operations, enabling horizontal scalability through the addition of cluster nodes, and enhancing performance via parallel query processing across multiple instances. This architecture allows organizations to maintain continuous access to data even during hardware or software failures, supporting mission-critical applications that require uninterrupted service. Key benefits include fault isolation, where a failure on one node does not impact others, ensuring that workloads continue seamlessly on surviving instances; load balancing across nodes to optimize resource utilization and prevent bottlenecks; and integration with for robust disaster recovery, allowing rapid to a standby site while maintaining data consistency. These features leverage shared storage for concurrent data access and Cache Fusion for low-latency block transfers between nodes, further minimizing disruptions. In modern versions, such as 23ai (released 2024), Oracle RAC supports clusters with over 100 nodes, achieving near-linear scalability without application modifications, and reduces downtime to under 15 seconds through fast mechanisms. It is particularly suited for (OLTP) systems demanding 24/7 uptime, such as banking and platforms, as well as large-scale analytics workloads that benefit from parallel execution without data partitioning.

Prerequisites

Hardware and Software Requirements

Oracle Real Application Clusters (RAC) requires robust hardware configurations to ensure and scalability across multiple nodes. Each node must utilize (SMP) servers or systems capable of supporting the cluster's workload, with a minimum of 8 GB RAM for Oracle Grid Infrastructure installations and at least 1 GB (recommended 2 GB or more) for the Oracle Database software. Shared storage is essential, typically implemented via (SAN) or (NAS) with multipath I/O for redundancy and performance, where Oracle Automatic Storage Management (ASM) is recommended for managing database files, control files, SPFILEs, redo logs, and recovery files. Networking demands at least one public network interface card (NIC) per node for client access and a dedicated private interconnect using high-speed Ethernet, with a minimum of 1 Gbps and recommendations for 10 Gbps or faster to handle Cache Fusion traffic efficiently. On the software side, Oracle RAC necessitates the Enterprise Edition of Oracle Database with the RAC option licensed, alongside Oracle Grid Infrastructure for cluster management, which must be installed in a separate Oracle home from the database software. For Oracle Database 21c and later (including the current 23ai release as of 2025), the multitenant architecture with a container database (CDB) is mandatory. Compatible operating systems include certified versions of (such as 8 and later, , and Server), (e.g., 2019, 2022, 2025), and on or platforms, ensuring identical OS configurations across all nodes to avoid compatibility issues. All hardware and software must comply with Oracle's certification matrices to guarantee support and . The Oracle Hardware Compatibility List (via My Oracle Support) verifies compatible servers, storage, and network adapters from vendors like , HP, and /Sun, while the software certification matrix outlines supported OS patches and versions. Virtualization environments are supported, including , Oracle VM, KVM, and Microsoft Hyper-V, allowing RAC deployment on certified virtual machines with performance considerations for shared resources. The minimum cluster configuration consists of 2 nodes for basic , with extending to 16 or more physical nodes depending on the storage protocol—such as up to 30 nodes with over a private network—though practical limits are influenced by interconnect bandwidth and storage .

Cluster Setup Essentials

Setting up an Oracle Real Application Clusters (RAC) environment requires meticulous planning to ensure , scalability, and performance. Initial planning steps involve conducting a capacity assessment to evaluate workload requirements, including operations per second () and throughput needs for the shared storage subsystem, which directly impacts database performance under concurrent access from multiple nodes. Redundancy must be incorporated into the shared storage design, typically using 1+0 configurations to provide both for and striping for balanced load distribution across disks. Additionally, integrating a strategy from the outset is essential, leveraging tools like Recovery Manager (RMAN) to support consistent backups of the shared database while accounting for cluster-wide synchronization. Storage configuration in Oracle RAC centers on Automatic Storage Management (ASM), which simplifies the management of shared disks by creating disk groups that span multiple nodes for pooled storage resources. ASM disk groups are used to store database files, redo logs, and control files, ensuring uniform access and automatic load balancing. Critical components include voting disks, which maintain cluster membership by recording node status—Oracle recommends configuring 3 to 5 voting disks in an ASM disk group with normal or high redundancy to tolerate failures without loss. The Oracle Cluster Registry (OCR) stores cluster configuration data and must be placed in a separate ASM disk group or shared , with redundancy provided through mirroring to prevent single points of failure. Note that starting with 21c, third-party clusterware is no longer supported. Network zoning is fundamental to isolate traffic types and enhance and performance in Oracle RAC. The public network handles client connections and administrative access, requiring or faster links with low latency to support SQL*Net traffic. In contrast, the private interconnect is dedicated to internal cluster communication, such as Cache Fusion for block transfers between instances, and should use a separate high-bandwidth, low-latency network like or to minimize contention. The private interconnect uses UDP ports (e.g., dynamically assigned in high ranges for Cache Fusion, fixed ports like 1638 for Cluster Synchronization Services). To avoid interference, VLANs are employed for separation, ensuring the private interconnect operates in an isolated segment that prevents public traffic from impacting cluster heartbeat or . Security basics in cluster setup focus on node authentication and controlled access to prevent unauthorized participation. The Oracle Advanced Security Option (ASO) enables strong authentication mechanisms, such as Kerberos or PKI-based certificates, to verify node identities during cluster join operations. Firewall configurations must allow specific ports: TCP port 1521 for SQL*Net listener connections from clients, and ensure the private interconnect's UDP ports are open between nodes while blocking external access.

Architecture

Key Components

Oracle Clusterware serves as the foundational cluster management software for Oracle Real Application Clusters (RAC), providing a portable infrastructure that binds multiple servers to operate as a single logical system. It manages essential cluster resources, including nodes, virtual IP (VIP) addresses, database services, and listeners, ensuring high availability and automated failover. Key subcomponents include Cluster Ready Services (CRS), which oversees the lifecycle of cluster resources stored in the Oracle Cluster Registry (OCR); Cluster Synchronization Services (CSS), which maintains node membership and synchronizes cluster operations to prevent split-brain scenarios; and Event Management (EVM), which monitors and propagates cluster events for proactive management. These elements collectively enable seamless resource allocation and recovery across the cluster. Shared resources form the core layer in RAC, allowing multiple database instances to access the same physical storage simultaneously. These include data files, control files, server parameter files (SPFILEs), and redo log files, all residing on cluster-aware shared disks that meet prerequisites such as high-speed access from all nodes. Automatic Storage Management (ASM) instances manage this shared storage, dynamically allocating disk groups and optimizing I/O performance for RAC environments. Oracle Net listeners, configured per node or as shared listeners, facilitate client connections, while the Single Client Access Name (SCAN) enhances scalability by resolving to three IP addresses for connection load balancing and failover across instances. Oracle RAC introduces specialized background processes to handle distributed operations, distinguishing it from single-instance databases. Multiple Lock Manager Server (LMS) processes per instance manage buffer cache coordination, including the transfer of blocks between instances, which is critical for Cache Fusion's reliance on these processes. The Lock Manager Daemon (LMD) process oversees distributed lock management, processing remote enqueue requests, detecting deadlocks, and ensuring resource consistency across the cluster. Additional processes like the Lock Monitor (LMON) support global enqueue monitoring and instance recovery, while the Lock Element (LCK) handles cross-instance calls. These processes operate alongside standard database background processes to maintain and performance in a multi-node setup. Oracle Grid Infrastructure integrates Oracle Clusterware with ASM, creating a unified platform for automated cluster and storage management in RAC deployments. Installed on each node, it simplifies administration by combining with disk provisioning, supporting up to 128 instances per database and enabling features like dynamic volume management. This integration reduces operational complexity, allowing administrators to treat the cluster as a single entity for provisioning and maintenance tasks.

Cache Fusion Protocol

Cache Fusion is the core protocol in Real Application Clusters (RAC) that enables efficient synchronization of data blocks across multiple database instances by transferring them directly between buffer caches over the private cluster interconnect, thereby bypassing disk I/O and minimizing latency. This mechanism logically merges the buffer caches of all instances into a single, shared global cache, allowing instances to access data as if it were local without frequent disk contention. By shipping blocks in rather than reading from shared storage, Cache Fusion significantly reduces reader/writer conflicts and overall system overhead, enhancing scalability for high-throughput workloads. The protocol operates in two primary modes to handle read and write requests while maintaining data consistency. For read operations, Cache Fusion transfers consistent read (CR) blocks, which are versions of data blocks constructed to reflect a consistent view at the time of the request, avoiding the need for rollback application on the requesting instance. In write scenarios, it transfers current (dirty) blocks—those modified but not yet written to disk—accompanied by lock mode conversions, such as upgrading from shared (S) to exclusive (X) mode to ensure only one instance can modify the block at a time. These conversions prevent concurrent modifications and preserve block integrity across the cluster. The Global Cache Service (GCS), a key component integrated with Cache Fusion, manages the modes and locations of data blocks to enforce cache coherency. GCS tracks blocks in modes including null (no holder), shared (multiple readers), exclusive (single modifier), and S/SSX (shared with shared past image for undo reconstruction). It coordinates access privileges by monitoring resource states and facilitating block transfers via Lock Management Services (LMS) processes, which handle inter-instance messaging. Additionally, GCS employs heartbeat mechanisms through these processes to continuously track resource availability and instance liveness, ensuring prompt detection of failures and reallocation of resources. Performance of Cache Fusion is evaluated using key metrics such as gc cr blocks received and gc cr blocks sent for consistent read transfers, and gc current blocks received and gc current blocks sent for current block transfers, which indicate the volume of inter-instance block traffic. These statistics, available in views like V$SYSSTAT, help identify bottlenecks in global cache activity. Fusion efficiency, often expressed as the ratio of CR blocks to total global cache blocks multiplied by 100, provides a measure of how effectively the protocol avoids disk access, with higher percentages signaling optimal cache utilization. Cache Fusion was introduced in Oracle Database 9i as a groundbreaking feature to enable true shared-cache architecture in RAC, replacing earlier disk-based coordination methods. Subsequent enhancements, particularly from Oracle Database 19c onward, have incorporated support for Remote Direct Memory Access (RDMA) over protocols like RoCE, allowing direct memory-to-memory transfers with reduced CPU involvement and further latency improvements on compatible hardware.

Networking and Connectivity

Interconnect Design

The interconnect in Real Application Clusters (RAC) serves as a dedicated that provides a high-bandwidth, low-latency communication pathway between cluster nodes, enabling efficient inter-instance data transfers via Cache Fusion, cluster heartbeats for node membership monitoring, and other essential cluster traffic. This infrastructure is crucial for maintaining data consistency and across multiple database instances, treating the cluster as a unified system. Common implementations utilize 10, 25, or (GbE) or to meet these performance demands. To ensure high availability, Oracle RAC employs redundancy through dual or multiple interconnect interfaces per node, configured without traditional bonding but leveraging High Availability IP (HAIP) for automatic failover and load balancing. HAIP dynamically assigns virtual IP addresses across available interfaces on different subnets, allowing seamless traffic redirection if a physical link or switch fails, thus preventing single points of failure. The protocol stack primarily uses UDP for its efficiency in handling the bursty, high-volume traffic patterns, with support for Reliable Datagram Sockets (RDS) over InfiniBand as an alternative. Bandwidth requirements for the interconnect start at a minimum of 1 Gb/s, but Oracle recommends 10 Gb/s or higher to support intensive workloads without bottlenecks. Latency must be kept low, ideally under 1 ms round-trip, to optimize Cache Fusion performance and minimize global cache waits. These specifications ensure that the interconnect functions effectively as a shared memory channel equivalent for the distributed database environment. Configuration of the interconnect involves assigning a private range exclusive to cluster communication, separate from networks, to isolate and prioritize internal traffic. Enabling jumbo frames with a (MTU) of 9000 bytes is recommended across all components, including host adapters, drivers, and switches, to reduce overhead and improve throughput for large block transfers. Starting with 12c, integration with (RoCE) enhances performance by enabling kernel-bypass direct memory access, particularly in environments supporting RDMA protocols like RDSv3.

Name Resolution and VIPs

In Oracle Real Application Clusters (RAC), name resolution for client connections occurs over the public network, where clients access database instances without needing to know specific node details. This setup relies on virtual IP addresses (VIPs) and domain name system (DNS) configurations to ensure seamless connectivity and high availability. Node VIPs are floating IP addresses hosted by Oracle Clusterware on the public network interface of each cluster node, allowing clients to connect directly to an instance via these addresses. By using VIPs, Oracle RAC achieves sub-second failover for client connections; if a node fails, the VIP migrates to another available node, preventing the typical TCP timeout delays of up to several minutes that would occur with static IP binding. To simplify client access and enable load balancing across the cluster, Oracle RAC introduced the Single Client Access Name (SCAN) in release 11g. SCAN is a DNS-based alias that resolves to three IP addresses, each associated with a SCAN VIP and listener, distributing incoming connections evenly without requiring clients to specify individual node VIPs. These SCAN VIPs function similarly to node VIPs but can relocate to any node in the cluster, providing a stable, single endpoint for clients regardless of node additions or failures. The SCAN configuration enhances , as it supports up to three addresses for redundancy and load distribution, and integrates with the cluster's public network zoning to maintain consistent access. For automated management of VIPs and hostnames in larger or dynamic environments, Oracle RAC introduced the Grid Naming Service (GNS) in release 11g Release 2 (11.2). GNS operates as a cluster-managed DNS service, using a static GNS VIP delegated by the external DNS to handle dynamic resolution of cluster resources, including node VIPs and SCAN addresses. This integration reduces administrative overhead by automatically updating DNS records for VIP migrations during node additions, removals, or failures, ensuring clients always resolve to active resources without manual intervention. Client applications connect to Oracle RAC using standard protocols like JDBC or ODBC, configured to target the SCAN name for transparent node affinity and . For instance, a JDBC might specify the SCAN and , allowing the driver to select an optimal instance based on service preferences while receiving Fast Application Notification (FAN) events for real-time updates on cluster changes. FAN, part of the Oracle RAC framework, notifies applications via APIs or callbacks about service relocations or instance failures, enabling proactive reconnection without full session loss. This configuration supports ODBC drivers similarly, promoting workload balancing and rapid recovery in enterprise deployments.

Implementation and Operations

Deployment Process

The deployment of Real Application Clusters (RAC) begins with preparing the operating system on all cluster nodes to meet the necessary prerequisites, such as installing required kernel packages, configuring user accounts like the grid and owners, setting kernel parameters for and semaphores, and ensuring SSH equivalence for passwordless communication between nodes. These steps ensure compatibility and secure inter-node operations before proceeding to software installation. Next, Oracle Grid Infrastructure is deployed using the Oracle Universal Installer (OUI), which is run on the first node in graphical or to install Oracle Clusterware and Oracle Automatic Storage Management (ASM) across the cluster. During the installation, OUI prompts for cluster node selection, virtual IP addresses, and storage options; it then copies the software to remote nodes and requires execution of the root.sh script on each node as the root user to configure system-level components like the Cluster Ready Services (CRS). Following Grid Infrastructure installation, ASM disk groups are created using the ASM Configuration Assistant (ASMCA) or through OUI options to provision shared storage for the database files, voting disks, and OCR. The software is then installed in RAC mode using OUI in "software only" configuration, selecting the Enterprise Edition and specifying the cluster nodes, with Oracle base and home directories distinct from the home. After software installation, the root.sh script is run on all nodes if not automated during OUI. To create the RAC database, the Database Configuration Assistant (DBCA) is invoked from the home, choosing the "Oracle Real Application Clusters database" template, specifying instances per node, and configuring parameters like and storage allocation to ASM disk groups. DBCA automates the creation of database instances, services, and across nodes. Cluster resources, including databases and instances, are managed post-deployment using the Server Control (SRVCTL) , which allows starting, stopping, and monitoring RAC components from any node. For maintenance, Oracle RAC supports rolling upgrades through a two-stage process: first, upgrading the Grid Infrastructure software node-by-node while maintaining cluster availability, followed by applying patches or database upgrades in batches to minimize . This approach leverages QUORUM-based voting disk management, where a of voting disks must remain accessible to sustain cluster operations during the upgrade. Verification of the deployment ensures all components are operational; the Cluster Verification Utility (CLUVFY) is run with commands like cluvfy stage -post crsinst -n all to check post-installation status of Clusterware and interconnect. Additionally, crsctl check crs confirms the status of Cluster Ready Services, CSS, and EVM daemons on each node, while srvctl status database -d db_name verifies database instance availability across the cluster. These tools provide comprehensive diagnostics to validate the RAC environment before production use.

Management and Diagnostics

Oracle Real Application Clusters (RAC) management involves a suite of command-line utilities and repositories designed to administer cluster resources, instances, and services efficiently. The Server Control Utility (SRVCTL) serves as the primary interface for managing Oracle RAC databases, enabling administrators to start, stop, and relocate database instances, services, and other cluster resources from a centralized command line. For instance, commands like srvctl start database -d db_name initiate all instances of a specified database across the cluster, while srvctl stop instance -d db_name -i instance_name halts a specific instance without affecting others. Complementing SRVCTL, the Cluster Ready Services Control (CRSCTL) utility provides control over Oracle Clusterware components, allowing operations such as checking cluster status with crsctl check cluster -all, starting or stopping the entire cluster stack via crsctl start crs, or managing high-availability resources. These tools ensure seamless administration of multi-node environments by abstracting complex cluster interactions into simple, scriptable commands. The Grid Infrastructure Management Repository (GIMR) enhances diagnostics by storing cluster-wide operational data, including performance metrics, alerts, and historical logs from Oracle Clusterware and RAC components, in a dedicated multitenant database pluggable database (PDB) per cluster. Administrators access GIMR data through tools like or SQL queries to diagnose issues such as resource failures or configuration drifts, facilitating proactive maintenance without relying on . Monitoring in Oracle RAC leverages Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM) to capture and analyze cluster-specific performance data, with a focus on global cache (gc) wait events that indicate inter-node block transfer delays. AWR snapshots, generated hourly by default, collect statistics on gc cr block receive time (waits for current or consistent read blocks) and gc buffer busy waits (contention for cached blocks), enabling identification of interconnect bottlenecks or load imbalances across instances. ADDM processes these AWR snapshots to produce actionable recommendations, such as optimizing SQL statements contributing to excessive gc waits or adjusting instance parameters for better cache fusion efficiency. For real-time visualization, (OEM) offers dashboards that monitor RAC clusters at the node, instance, and service levels, displaying metrics like CPU usage, I/O throughput, and cluster interconnect health through graphical interfaces and alerts. OEM integrates with Clusterware to provide end-to-end views, including proactive notifications for threshold breaches. Diagnostics in RAC emphasize automated collection and analysis to expedite issue resolution. The Trace File Analyzer (TFA) automates the gathering of diagnostic logs, traces, and system information from Grid Infrastructure, RAC databases, and supporting OS components, supporting proactive detection of problems like memory leaks or network faults via its collector daemon. Administrators invoke TFA with commands such as tfactl diagcollect to bundle relevant files for support analysis. Recommended Patch Analyzer (ORAchk) performs comprehensive health checks, scanning configurations, patches, and best practices compliance across the RAC stack to identify vulnerabilities or misconfigurations before they impact availability. It generates reports with severity ratings and remediation steps, often run periodically or on-demand via orachk -runallchecks. Event handling is streamlined through Fast Application Notification (FAN) and Fast Connection Failover (FCF), where FAN notifies applications of cluster events like node failures or service relocations, and FCF enables connection pools to automatically redirect to surviving instances in under a second, minimizing without application code changes. Common issues in Oracle RAC, such as scenarios where network partitions lead to multiple nodes attempting concurrent database control, are resolved through Oracle Clusterware's mechanisms, which evict errant nodes via node kill or power-off actions to maintain and prevent corruption. Interconnect latency, often manifesting as elevated gc waits, is diagnosed using standard OS utilities like ping for round-trip times and for path analysis on the private cluster network, helping isolate hardware or configuration problems.

History and Development

Evolutionary Milestones

The development of originated from its predecessor, Oracle Parallel Server (OPS), which was introduced with Oracle Database 7 (1992), and enhanced in 7.3 (1996) and Oracle 8i (1998). OPS utilized a shared-disk architecture with disk-based locking through the Integrated Distributed Lock Manager (DLM), requiring frequent writes to a central lock file on shared storage to manage block access across instances. This approach led to I/O bottlenecks, particularly "pinging," where data blocks were repeatedly flushed to and read from disk during inter-instance transfers, limiting scalability in high-concurrency environments. In 2001, 9i introduced RAC as the successor to , featuring Cache Fusion as a pivotal for cache coherency. Cache Fusion shifted block transfers from disk I/O to direct memory-to-memory communication over a low-latency interconnect, using the Global Cache Service (GCS) and Global Enqueue Service (GES) to maintain consistent data access without the performance degradation of . This enabled linear scalability for read-write workloads, marking RAC's transition to a truly shared-cache model. Oracle Database 10g, released in 2003, advanced RAC's infrastructure with the debut of Oracle Clusterware, a vendor-neutral clustering framework that provided essential services like node membership, , and management independent of proprietary hardware such as Digital's TruCluster. Previously limited to specific platforms, this shift broadened RAC's applicability across diverse operating systems. Concurrently, integration with 10g Grid Control introduced unified monitoring and provisioning tools for RAC clusters, simplifying administration in grid environments. The 2007 release of 11g supported up to 100 nodes per cluster, continuing the introduced in 10g Release 2 while enhancing horizontal growth for large-scale deployments through optimized interconnect protocols. In 11g Release 2 (), the Single Client Access Name (SCAN) was introduced, offering a single for client connections that resolved to multiple IP addresses for load balancing and across nodes, reducing configuration complexity. enhancements, including policy-based and workload isolation, further improved response times and throughput in mixed-workload scenarios. From 12c onward, starting in 2013, RAC incorporated multitenant architecture support, enabling container databases (CDBs) and pluggable databases (PDBs) to distribute across multiple instances for consolidated yet isolated workloads. Flex ASM emerged as a key feature, allowing a reduced number of ASM instances (as few as one per cluster) to manage disk groups for numerous databases, minimizing overhead and improving storage in large clusters. In the 19c release (2019), RAC evolved with cloud-native adaptations, including seamless integration with Infrastructure for clusters, automated provisioning via @Customer, and support for container orchestration in environments to facilitate hybrid and public cloud migrations.

Versions and Release Updates

Oracle Real Application Clusters (RAC) has evolved through several key database releases, with 19c serving as the primary (LTS) version since its general in 2019. Premier Support for 19c extends until December 31, 2029, followed by Extended Support through December 31, 2032, providing a stable foundation for RAC deployments with ongoing security patches, bug fixes, and enhancements. Notable RAC improvements in 19c include architectural optimizations for better and , such as enhanced Cache Fusion performance, alongside general features like automatic indexing that automatically creates and manages indexes to improve query performance in multi-node environments. Innovation releases have introduced advanced capabilities with shorter support windows. Oracle Database 21c, released in August 2021, emphasized features for RAC, including improved connection management and across nodes to minimize . Support for 21c ends in July 2027, with no Extended Support available, marking the conclusion of its active innovation phase originally slated for 2023 but extended to accommodate ongoing adoption. Oracle Database 23ai, available since May 2024, integrates AI-driven features like AI Vector Search for handling alongside relational workloads, with specific RAC enhancements for hybrid , such as improved multi-node coordination for distributed vector operations. In 2025, Oracle AI Database 26ai was introduced via Release Update 23.26.0, replacing 23ai without requiring database upgrades or application re-certification. This long-term support release extends Support until December 2031 and includes RAC enhancements such as globally distributed databases using RAFT-based replication for multi-master active-active configurations, enabling in under 3 seconds with zero . In 2025, Release Updates (RUs) for 23ai addressed critical RAC needs. The January 2025 RU 23.7 included security patches via the Critical Patch Update and fixes for RAC rolling upgrades, enabling two-stage patching for features like vector support in external tables without full cluster downtime. The April 2025 RU 23.8 introduced performance optimizations for multi-node RAC, including sparse vector support in and enhanced hybrid vector indexes, applied through RAC two-stage rolling updates to boost efficiency in large-scale deployments. Support for older versions has concluded, with Oracle Database 12c reaching the end of Extended Support in July 2022, after which no further patches are provided. Oracle recommends migration paths from 12c to 19c or 23ai/26ai using tools like the Database Upgrade Assistant to preserve RAC configurations while adopting modern features. Cloud adaptations of RAC extend its reach beyond on-premises setups. Oracle Cloud Infrastructure (OCI) supports full RAC deployments in the public cloud, offering scalable multi-node clusters with automated management. Exadata Cloud@Customer provides a hybrid option, delivering complete RAC functionality on dedicated Exadata hardware within customer data centers, managed via OCI for seamless integration with cloud services.

Competitive Landscape

Shared-Everything Architectures

In shared-everything architectures, all nodes in a database cluster have concurrent access to the same shared storage resources, including disk-based files, control files, and redo logs, while also maintaining coherency in their in-memory caches through specialized protocols. This design eliminates the need for data partitioning or sharding, allowing all nodes to participate actively in processing workloads simultaneously, which supports high-throughput (OLTP) and other active-active scenarios without application modifications. Oracle Real Application Clusters (RAC) represents a prominent of this architecture, where multiple database instances share a single logical database on cluster-aware storage. Central to RAC's functionality is Cache Fusion, a technology that enables direct inter-instance transfer of modified data blocks via a high-speed private interconnect, effectively creating a unified global buffer cache and reducing reliance on slower disk I/O. This contrasts with traditional failover-only clusters, where inactive nodes merely provide redundancy rather than contributing to ongoing workload processing; in RAC, all nodes remain active, coordinating changes across buffer caches using the Global Cache Service (GCS) and Global Enqueue Service (GES) to ensure data consistency. Other notable examples include PureScale, a shared-disk clustering solution that uses a dedicated cluster caching facility () to manage global locking and buffer pool sharing across members, facilitating similar cache coherency without data redistribution. An earlier system, IBM's Parallel Sysplex for Db2 on , introduced in 1994, also employs a shared-everything model through Facility structures for lock management and group buffer pools, allowing multiple subsystems to access the same database concurrently with high-performance data sharing. These architectures offer advantages such as simplified administration, as there is no requirement for complex data partitioning or application-level sharding, enabling transparent by adding nodes to handle increased loads. RAC, for instance, supports on-demand scaling across commodity hardware while maintaining balance through integrated quality-of-service features. However, a key disadvantage is the potential in the shared storage subsystem, which necessitates robust, cluster-aware storage solutions like ASM or IBM GPFS to mitigate risks. In terms of availability, RAC deployments have demonstrated 99.999% uptime in production environments by leveraging node addition and automatic to redistribute workloads seamlessly upon instance failure.

Shared-Nothing and Hybrid Alternatives

Shared-nothing architectures represent a fundamental alternative to Oracle RAC's shared-everything model, where each node operates independently with its own dedicated storage and memory, and data is partitioned across nodes to enable horizontal scaling without resource contention. In this design, there is no single point of failure from shared components, as nodes do not share disk or memory, allowing for massive parallel processing in distributed environments. For instance, Google Cloud Spanner employs a shared-nothing approach with range-based sharding to distribute data, supporting high scalability for globally distributed applications while maintaining strong consistency through synchronous replication. Similarly, Microsoft's SQL Server Analytics Platform System (formerly Parallel Data Warehouse) uses shared-nothing partitioning to handle large-scale data warehousing, where data movement between nodes is minimized to optimize query performance. PostgreSQL with the Citus extension exemplifies open-source shared-nothing scaling, transforming a single PostgreSQL instance into a distributed cluster by sharding tables across worker nodes coordinated by a central node. Data is distributed based on a hash or column value, enabling parallel query execution for workloads like multi-tenant applications or real-time analytics. Hybrid models blend elements of shared-nothing and shared-everything to balance scalability with accessibility, often incorporating a shared storage layer alongside local node resources. Amazon Aurora adopts this by using a shared cluster storage volume—essentially a distributed ledger replicated across availability zones—for persistent data, while each database instance maintains local caches for temporary operations like sorting or indexing, avoiding the full independence of pure shared-nothing systems. This design allows multiple instances to access the same data without partitioning, supporting up to 256 TiB of storage with automatic replication for high availability. Snowflake further hybridizes the approach with a central shared-data repository using micro-partitions for efficient access, combined with independent massively parallel processing (MPP) compute clusters that process local data portions in a shared-nothing manner, facilitating seamless scaling without manual data redistribution. In comparisons, Oracle RAC's shared-everything model, with its cache fusion for inter-node block transfers, provides low-latency access ideal for (OLTP) workloads requiring frequent cross-node data sharing, whereas shared-nothing systems like Spanner or Citus excel in massive analytics and by eliminating global locks and enabling independent node parallelism, though performance hinges on effective data partitioning. For example, shared-nothing avoids the overhead of shared storage I/O but demands upfront partitioning to align with query patterns, contrasting RAC's transparent access to unified data. In the modern landscape as of 2025, serverless offerings like Snowflake's shared-data diminish the need for manual cluster configurations akin to RAC, as compute resources auto-scale independently of storage via virtual warehouses, simplifying operations for analytics-heavy use cases. Cost considerations favor open-source alternatives; Oracle RAC incurs significant licensing fees—often exceeding $47,500 per processor for Enterprise Edition plus add-ons—while with Citus remains free, with optional commercial support available from vendors, making it attractive for cost-sensitive scaling. A key drawback of shared-nothing and hybrid alternatives is the requirement for application modifications to accommodate partitioning, such as selecting distribution columns in Citus or schema adjustments for sharding, which can introduce complexity and limit transparency compared to RAC's seamless, application-agnostic access across nodes. In complex environments with thousands of tables, like systems, shared-nothing partitioning may necessitate frequent data reorganization to maintain balance, potentially impacting operational efficiency.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.