Hubbry Logo
search
logo
POWER9
POWER9
current hub
590030

POWER9

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
POWER9
Technician holding a POWER9 processor
General information
Launched2017
Designed byIBM
Common manufacturer
Performance
Max. CPU clock rate4 GHz[1]
Cache
L1 cache32+32 KiB per core[1]
L2 cache512 KiB per core[1]
L3 cache120 MiB per chip[1]
L4 cachevia Centaur[1]
Architecture and classification
Technology node14 nm (FinFET)
Instruction setPower ISA (Power ISA v.3.0)
Physical specifications
Cores
  • 12 SMT8 cores or 24 SMT4 cores on die[2][3][4]
History
PredecessorPOWER8
SuccessorPower10

POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016.[2] The POWER9-based processors are being manufactured using a 14 nm FinFET process,[3] in 12- and 24-core versions, for scale out and scale up applications,[3] and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members.[5]

Summit, the ninth fastest supercomputer in the world (based on the Top500 list as of June 2024[6]), is based on POWER9, while also using Nvidia Tesla GPUs as accelerators.[7]

Design

[edit]

Core

[edit]

The POWER9 core comes in two variants, a four-way multithreaded one called SMT4 and an eight-way one called SMT8.[1] The SMT4- and SMT8-cores are similar, in that they consist of a number of so-called slices fed by common schedulers. A slice is a rudimentary 64-bit single-threaded processing core with load store unit (LSU), integer unit (ALU) and a vector scalar unit (VSU, doing SIMD and floating point). A super-slice is the combination of two slices. An SMT4-core consists of a 32 KiB L1 cache (1 KiB = 1024 bytes), a 32 KiB L1 data cache, an instruction fetch unit (IFU) and an instruction sequencing unit (ISU) which feeds two super-slices. An SMT8-core has two sets of L1 caches and, IFUs and ISUs to feed four super-slices. The result is that the 12-core and 24-core versions of POWER9 each consist of the same number of slices (96 each) and the same amount of L1 cache.

A POWER9 core, whether SMT4 or SMT8, has a 12-stage pipeline (five stages shorter than its predecessor, the POWER8), but aims to retain the clock frequency of around 4 GHz.[1] It will be the first to incorporate elements of the Power ISA v.3.0 that was released in December 2015, including the VSX-3 instructions.[8] The POWER9 design is made to be modular and used in more processor variants and used for licensing, on a different fabrication process than IBM's.[9] On chip are co-processors for compression and cryptography, as well as a large low-latency eDRAM L3 cache.[3]

The POWER9 comes with a new interrupt controller architecture called "eXternal Interrupt Virtualization Engine" (XIVE) which replaces a much simpler architecture that was used in POWER4 through POWER8. XIVE will also be used in Power10.[10][11][12]

Scale out / scale up

[edit]
  • IBM POWER9 SO – scale-out variant, optimized for dual socket computers with up to 120 GB/s bandwidth (1 GB = 1 billion bytes) to directly attached DDR4 memory[1][3][9] (targeted for release in 2017)
  • IBM POWER9 SU – scale-up variant, optimized for four sockets or more, for large NUMA machines with up to 230 GB/s bandwidth to buffered memory[1][9] (uses "25.6 GHz" signaling with the PowerAXON 25 GT/sec Link interface[13])

Both POWER9 variants can ship in versions with some cores disabled due to yield reasons, as such Raptor Computing Systems first sold 4-core chips, and even IBM initially sold its AC922 systems with no more than 22-core chips, even though both types of chips have 24 cores on their dies.[14][4]

I/O

[edit]

A lot of facilities are on-chip for helping with massive off-chip I/O performance:

  • The SO variant has integrated DDR4 controllers for directly attached RAM, while the SU variant will use the off-chip Centaur architecture introduced with POWER8 to include high performance eDRAM L4 cache and memory controllers for DDR4 RAM.[1][3]
  • The Bluelink interconnects for close attachment of graphics co-processors from Nvidia (over NVLink v.2) and OpenCAPI accelerators.[15]
  • General purpose PCIe v.4 connections for attaching regular ASICs, FPGAs and other peripherals as well as CAPI 2.0 and CAPI 1.0 devices designed for POWER8.
  • Multiprocessor (symmetric multiprocessor system) links to connect other POWER9 processors on the same motherboard, or in other closely attached enclosures.

Chip types

[edit]

POWER9 chips can be made with two types of cores, and in a Scale Out or Scale Up configuration. POWER9 cores are either SMT4 or SMT8, with SMT8 cores intended for PowerVM systems, while the SMT4 cores are intended for PowerNV systems, which do not use PowerVM, and predominantly run Linux. With POWER9, chips made for Scale Out can support directly attached memory, while Scale Up chips are intended for use with machines with more than two CPU sockets, and use buffered memory.[16][1]

POWER9 Chips
PowerNV PowerVM
24 × SMT4 core 12 × SMT8 core
Scale Out Nimbus unknown
Scale Up Cumulus

Modules

[edit]

The IBM Portal for OpenPOWER lists the three available modules for the Nimbus chip, although the Scale-Out SMT8 variant for PowerVM also uses the LaGrange module/socket:[17]

  • Sforza – 50 mm × 50 mm, 4 DDR4, 48 PCIe lanes, 1 XBus 4B[18]
  • Monza – 68.5 mm × 68.5 mm, 8 DDR4, 34 PCIe lanes, 1 XBus 4B, 48 OpenCAPI lanes[19]
  • LaGrange – 68.5 mm × 68.5 mm, 8 DDR4, 42 PCIe lanes, 2 XBus 4B, 16 OpenCAPI lanes[20]

Sforza modules use a land grid array (LGA) 2601-pin socket.[21]

Systems

[edit]

Raptor Computing Systems / Raptor Engineering

[edit]
  • Talos II – two-socket workstation/server platform using POWER9 SMT4 Sforza processors;[22] available as 2U server, 4U server, tower, or EATX mainboard. Marketed as secure and owner-controllable with free and open-source software and firmware. Initially shipping with 4-core,[23] 8-core,[24] 18-core,[25] and 22-core[26] chip options until chips with more cores are available.[27][28]
  • Talos II Lite – single-socket version of the Talos II mainboard, made using the same PCB.[29]
  • Blackbird – single-socket microATX platform using SMT4 Sforza processors (up to 8-core 160 W variant), 4–8 cores, 2 RAM slots (supporting up to 256 GiB total)[30]

Google–Rackspace partnership

[edit]
  • Barreleye G2 / Zaius – two-socket server platform using LaGrange processors;[22] both the Barreleye G2 and Zaius chassis use the Zaius POWER9 motherboard[31][32][33]

IBM

[edit]
  • Power System AC922 – 2U, 2× POWER9 SMT4 Monza, with up to 6× Nvidia Volta GPUs, 2× CAPI 2.0 attached accelerators and 1 TiB DDR4 RAM. AC here is an abbreviation for Accelerated Computing; this system is also known as "Witherspoon" or "Newell".[22][34][35][36][37]
  • Power System L922 – 2U, 1–2× POWER9 SMT8, 8–12 cores per processor, up to 4 TiB DDR4 RAM (1 TiB = 1024 GiB), PowerVM running Linux.[38][39]
  • Power System S914 – 4U, 1× POWER9 SMT8, 4–8 cores, up to 1 TiB DDR4 RAM, PowerVM running AIX/IBM i/Linux.[38][39]
  • Power System S922 – 2U, 1–2× POWER9 SMT8, 4–11 cores per processor, up to 4 TiB DDR4 RAM, PowerVM running AIX/IBM i/Linux.[40]
  • Power System S924 – 4U, 2× POWER9 SMT8, 8–12 cores per processor, up to 4 TiB DDR4 RAM, PowerVM running AIX/IBM i/Linux.[38][39][41]
  • Power System H922 – 2U, 1–2× POWER9 SMT8, 4–10 cores per processor, up to 4 TiB DDR4 RAM, PowerVM running SAP HANA (on Linux) with AIX/IBM i on up to 25% of the system.[38][39][42]
  • Power System H924 – 4U, 2× POWER9 SMT8, 8–12 cores per processor, up to 4 TiB DDR4 RAM, PowerVM running SAP HANA (on Linux) with AIX/IBM i on up to 25% of the system.[38][39][42]
  • Power System E950 – 4U, 2–4× POWER9 SMT8, 8–12 cores per processor, up to 16 TiB buffered DDR4 RAM[43]
  • Power System E980 – 1–4× 4U, 4–16× POWER9 SMT8, 8–12 cores per processor, up to 64 TiB buffered DDR4 RAM[44]
  • Hardware Management Console 7063-CR2 – 1U, 1× POWER9 SMT8, 6 cores, 64-128 GB DDR4 RAM.[45]

Penguin Computing

[edit]
  • Magna PE2112GTX – 2U, two-socket server for high performance computing using LaGrange processors. Manufactured by Wistron.[46]

IBM supercomputers

[edit]
POWER9 wafer with TOP500 certificates for Summit & Sierra
  • Summit and Sierra – The United States Department of Energy together with Oak Ridge National Laboratory and Lawrence Livermore National Laboratory contracted IBM and Nvidia to build two supercomputers, the Summit and the Sierra, are based on POWER9 processors coupled with Nvidia's Volta GPUs. These systems are slated to go online in 2017.[47][48][49] Sierra is based on IBM's Power Systems AC922 compute node.[35] The first racks of Summit were delivered to Oak Ridge National Laboratory on 31 July 2017.[50]
  • MareNostrum 4 – One of the three clusters in the emerging technologies block of the fourth MareNostrum supercomputer is a POWER9 cluster with Nvidia Volta GPUs. This cluster is expected to provide more than 1.5 petaflops of computing capacity when installed. The emerging technologies block of the MareNostrum 4 exists to test if new developments might be "suitable for future versions of MareNostrum".[51]

Operating system support

[edit]

As with its predecessor, POWER9 is supported by FreeBSD,[52] IBM AIX, IBM i, Linux (both running with and without PowerVM), and OpenBSD.[53]

Implementation of POWER9 support in the Linux kernel began with version 4.6 in March 2016.[54]

Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise (SLES), Debian Linux, Ubuntu Linux, and CentOS are supported as of August 2018.[55][56][57][58][59]

The GNU Guix package manager also supports POWER9, but however support for the Guix System Distribution is in Technology Preview.[60][61]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
POWER9 is a family of 64-bit central processing units (CPUs) developed by IBM and introduced in December 2017, implementing the Power Instruction Set Architecture (ISA) version 3.0 with a focus on high-performance computing, artificial intelligence, data analytics, and mission-critical enterprise applications. As of 2025, support for POWER9 systems ends on January 31, 2026.[1][2][3] Fabricated using a 14 nm FinFET silicon-on-insulator (SOI) process with approximately 8 billion transistors on a approximately 25 mm × 27 mm die (695 mm²), POWER9 processors employ a modular single-chip module (SCM) design that supports scalable configurations from single-socket scale-out systems to multi-node enterprise servers with up to 192 cores.[2][4] The architecture emphasizes enhanced core performance and efficiency, with configurable core counts of 6, 8, 10, 11, or 12 active cores per processor module, each capable of up to 8-way simultaneous multithreading (SMT) for a maximum of 96 threads per module.[2] Clock speeds range from 2.8 GHz to 4.0 GHz depending on the variant and workload, paired with a hierarchical cache system including 32 KB L1 instruction cache and 32 KB L1 data cache per core, 512 KB L2 cache per core, 10 MB embedded DRAM (eDRAM) L3 cache per core (totaling 120 MB on-chip), and up to 128 MB off-chip L4 eDRAM cache.[2][5] Memory support includes DDR4 at up to 1600 MHz across 8 channels per processor, delivering up to 230 GB/s bandwidth and enabling system capacities of up to 64 TB in multi-node configurations.[2] Key innovations in POWER9 include integrated PCIe Generation 4 I/O with up to 64 GB/s per slot and native NVLink 2.0 support for accelerator coherency, facilitating tight integration with GPUs and other devices via the Open Coherent Accelerator Processor Interface (OpenCAPI).[2] Advanced reliability, availability, and serviceability (RAS) features, such as processor instruction retry, core-contained checkstops, and dynamic sparing of failed components using Capacity on Demand resources, ensure high uptime for demanding environments.[2] Energy management is handled through IBM EnergyScale technology, offering variable frequency modes for optimized performance and power efficiency.[2] POWER9 powers a range of IBM Power Systems servers, including the scale-out LC models for AI and the enterprise E980 for large-scale transactional processing, supporting operating systems like AIX, IBM i, and Linux.[6][2]

History and Development

Announcement and Design Goals

IBM announced the POWER9 processor family on August 23, 2016, during a presentation at the Hot Chips 28 symposium in Cupertino, California.[7] The unveiling highlighted POWER9 as the next evolution in IBM's POWER architecture, succeeding the POWER8 and targeting the demands of the emerging "cognitive era" characterized by data-intensive computing.[8] This announcement came amid growing industry focus on artificial intelligence (AI), machine learning, and high-performance computing (HPC), positioning POWER9 to address bottlenecks in traditional processor designs for these workloads.[9] The primary design goals for POWER9 centered on enhancing performance for analytics, AI, cognitive applications, HPC, cloud infrastructure, and enterprise environments.[8] Key objectives included significantly increasing memory bandwidth to handle massive datasets more efficiently, with scale-out variants delivering up to 120 GB/s and scale-up variants up to 230 GB/s, representing an effective doubling of compute resources per socket compared to POWER8.[8] Additionally, the architecture aimed to integrate advanced I/O capabilities, such as NVLink 2.0 for high-bandwidth GPU acceleration and PCIe Gen4 support with 48 lanes providing 192 GB/s duplex bandwidth, to reduce latency and improve data transfer rates for heterogeneous computing.[7] These features were intended to enable seamless scaling across single-socket scale-out systems and multi-socket scale-up configurations, optimizing for both density and capacity.[9] Development of POWER9 involved close collaborations with key industry partners to foster an open ecosystem. IBM worked with NVIDIA to integrate NVLink 2.0, enabling direct, high-speed connections between POWER9 processors and NVIDIA GPUs for AI and HPC acceleration.[7] Through the OpenPOWER Foundation, which boasts over 200 members, IBM emphasized open innovation, including partnerships with Google and Rackspace for compliant server designs aligned with the Open Compute Project.[8] These efforts aimed to broaden adoption by supporting diverse workloads in enterprise servers, supercomputers like those for DOE initiatives, and cloud data centers.[9]

Release Timeline and Milestones

The development of POWER9 culminated in the finalization of the processor's architecture ahead of fabrication, enabling the production of initial silicon samples by the second half of 2017, following intensive validation efforts to ensure compatibility with advanced 14 nm FinFET processes. However, the project encountered significant delays stemming from complexities in the 14 nm manufacturing node, including yield challenges at GlobalFoundries and broader supply chain disruptions that pushed back early availability timelines.[10] IBM filed a lawsuit against GlobalFoundries in June 2021 over these and related roadmap failures, which was settled in January 2025 with undisclosed terms.[11] IBM announced the first commercial POWER9-based system, the Power Systems AC922, in December 2017, with general availability beginning shortly thereafter and broader shipments ramping up in 2018. This server targeted high-performance computing (HPC) and AI workloads, integrating POWER9 processors with NVIDIA GPUs via NVLink interconnects. A key adoption milestone came in 2018 with the deployment of POWER9 in the Summit supercomputer at Oak Ridge National Laboratory, where it powered over 4,600 nodes to achieve exascale-level performance for scientific simulations.[12][13][14] As IBM shifted focus to its successor, POWER10—announced in August 2020 and entering production in 2021—new manufacturing of POWER9 processors tapered off around 2020-2021, with end-of-sale dates for associated systems extending into 2023-2024. These transitions reflected the rapid evolution in IBM's Power roadmap, prioritizing next-generation capabilities for enterprise and HPC environments while maintaining support for existing POWER9 deployments through at least January 31, 2026.[15]

Architecture

Core Design

The POWER9 core employs a superscalar, out-of-order execution microarchitecture fabricated on a 14 nm FinFET process, designed to deliver enhanced single-thread performance while supporting simultaneous multithreading (SMT). Each core supports up to eight hardware threads via SMT8, allowing efficient resource sharing among threads for improved throughput in multithreaded workloads, with modes configurable from single-thread (ST) to SMT8.[16][17] The core features 12 execution pipelines, including four fixed-point arithmetic logic units (ALUs), four floating-point units (FPUs), vector/scalar units for 128-bit operations, and specialized units for division, cryptography, and permutation, enabling wide issue widths for compute-intensive tasks.[18][8] Additionally, it includes two symmetric load/store units and two dedicated load units, capable of handling up to four double-word loads or stores per cycle, which supports high-bandwidth data movement critical for data-centric applications.[2][16] The pipeline design consists of 12 stages from fetch to completion for fixed-point operations, reduced by five stages compared to the POWER8, to balance frequency and latency while minimizing power consumption through agile local control and reduced hazard penalties.[8][18] Enhancements over the POWER8 include larger rename buffers—20 primary entries plus 96-entry secondary history buffers per execution slice for registers like GPRs, FPRs, and VSRs—and improved branch prediction with a TAGE-style predictor supporting up to eight branches per cycle, a 64-entry link stack, and 512-entry global count cache, enabling better handling of unoptimized code and interpretive languages.[16][8] These changes contribute to approximately 1.5 times the single-thread performance of the POWER8 core at equivalent frequencies.[18] Clock speeds vary by variant, reaching up to 3.4–4.0 GHz in high-performance configurations to sustain this efficiency.[19] The cache hierarchy prioritizes low-latency access with a 32 KB eight-way set-associative L1 instruction cache and a 32 KB L1 data cache per core, both optimized for thread partitioning in SMT modes.[16][8] Each core has a dedicated 512 KB L2 cache, eight-way associative with 128-byte lines, while L3 cache is shared, providing 10 MB per core in a non-uniform cache architecture (NUCA) totaling up to 120 MB on-chip for a 12-core chip.[17][8] The core fully supports Vector Scalar Extensions (VSX) with four 128-bit SIMD pipelines, facilitating accelerated processing for AI and scientific workloads through operations like vector addition and matrix computations.[16][18]

Scale-Out and Scale-Up Variants

The POWER9 processor family features distinct scale-out and scale-up variants, each optimized for specific deployment scenarios in datacenter and enterprise environments. Scale-out variants are engineered for cost-effective, high-density servers, emphasizing dense packing and efficiency in clustered systems. These configurations typically employ SMT8 threading, supporting up to eight simultaneous threads per core, with available core counts of 4, 6, 8, 10, or 12 per chip. This design facilitates configurations such as 18 to 24 cores per socket in dual-chip modules (DCM), ideal for scalable, Linux-oriented workloads in traditional datacenters.[19][7] In contrast, scale-up variants target high-end enterprise and high-performance computing (HPC) applications, prioritizing per-socket throughput and multi-socket scalability. These use SMT4 threading, enabling up to four threads per core for enhanced concurrency in thread-heavy tasks, with the same core count options of 4, 6, 8, 10, 11, or 12 per chip. Such setups support up to 24 cores per socket in dual-chip modules (DCM) or 12 cores in single-chip modules (SCM), allowing larger system images with greater logical processor density. The threading in scale-up variants builds on the core design's multithreading capabilities to handle demanding, latency-sensitive operations.[19][7][20] Core count variations across both variants—4, 6, 8, 10, 11, and 12 active cores per chip—are achieved by selectively enabling portions of the die's potential 24-core layout, balancing performance and power. Scale-up variants generally incorporate larger per-core cache allocations, such as 10 MB of L3 cache per core, to support data-intensive processing in expansive systems, while scale-out maintains similar cache structures but optimized for lower latency in direct-attached memory setups.[19][21] Power and thermal design further differentiates the variants to match their use cases. Scale-out chips operate at a thermal design power (TDP) of approximately 150–225 W per chip, enabling efficient cooling and power delivery in densely populated, cost-optimized racks. Scale-up variants, designed for more integrated, high-performance nodes, support TDPs up to 200–250 W per chip, accommodating the increased thermal demands of higher threading and larger cache in multi-socket configurations. These power profiles ensure reliability in scale-out's volume deployments versus scale-up's focused, high-impact systems.[22][23]

I/O and Interconnect Features

The POWER9 processor incorporates PCIe Generation 4 (Gen4) support, providing up to 48 lanes per chip operating at 16 GT/s, which enables high-bandwidth connectivity for peripherals including storage devices, network adapters, and expansion cards. This configuration delivers approximately 192 GB/s of aggregate PCIe bandwidth, doubling the throughput of PCIe Gen3 while maintaining compatibility with existing ecosystems.[16][21] NVLink 2.0 serves as the primary high-speed interconnect for GPU acceleration on POWER9, offering 25 GB/s bidirectional bandwidth per link and supporting up to 6 links per chip to facilitate direct, low-latency data transfer between the processor and attached GPUs such as the NVIDIA V100. This setup achieves an aggregate bandwidth of up to 300 GB/s across all links, optimizing data movement in heterogeneous computing environments like AI and high-performance computing workloads.[16][19] OpenCAPI provides a coherent interface for attaching accelerators, allowing custom ASICs and FPGAs to participate in the processor's cache coherence domain with up to 25 GB/s bandwidth per port and support for 3 ports per chip. Operating over the same 25 Gbps signaling as NVLink, OpenCAPI enables flexible integration of specialized hardware while sharing ports when needed, with effective throughput reaching approximately 22.5 GB/s per link after protocol overhead.[16][24] The on-chip fabric in POWER9 ensures efficient cache coherence across its cores, L3 cache, memory controllers, and I/O units, delivering an aggregate bandwidth of up to 1.8 TB/s for coherence traffic to support scalable multi-core and multi-chip operations. This internal interconnect uses high-speed buses, including 8 data buses and 4 snoop buses operating at frequencies up to 2400 MHz, to minimize latency in data sharing and directory-based coherence protocols.[16]

Manufacturing and Variants

Process Technology

The POWER9 processor utilizes a 14 nm FinFET silicon-on-insulator (SOI) fabrication process developed in collaboration with GlobalFoundries.[7] This advanced node features a 17-layer metal interconnect stack and embedded dynamic random-access memory (eDRAM) elements, enabling high-speed signaling and efficient on-chip caching.[4] Each POWER9 chip integrates approximately 8 billion transistors, supporting complex multithreaded architectures while maintaining compatibility with high-performance computing workloads.[19] For the scale-out variant optimized for single- and dual-socket systems, the die measures approximately 25.2 mm by 27.5 mm, yielding a total area of about 693 mm².[4] The scale-up variant, designed for multi-socket enterprise configurations with up to 12 cores per die, employs a refined layout to accommodate additional I/O interfaces and memory controllers, resulting in a die area of around 693 mm² while preserving transistor density.[5] This 14 nm process marks a substantial evolution from the 22 nm SOI technology of the POWER8, delivering higher transistor integration and reduced power leakage through FinFET structures that improve gate control and drive current.[7] The node transition contributes to enhanced yield rates during manufacturing, primarily from shorter pipelines and optimized voltage scaling that lower dynamic power dissipation without sacrificing clock speeds up to 4.0 GHz.[9] Initial production occurred at GlobalFoundries facilities, scaling to full volume output on the same foundry to meet demand for both scale-out and scale-up deployments.[7]

Chip Modules and Packaging

The POWER9 processor is implemented in both single-chip module (SCM) and dual-chip module (DCM) configurations to address different system density and performance needs. The SCM consists of a single die housed in a land grid array (LGA) package with 3899 pins at a 1.5 mm interstitial pitch, measuring 68.5 mm × 68.5 mm overall.[25] This design supports up to 12 cores in scale-up variants, enabling high-performance computing in enterprise environments with direct socket integration for multi-socket scalability.[5] In contrast, the DCM integrates two dies within a single module to enhance core density for scale-out applications, connecting the dies via an X-Bus interconnect that provides 64 GB/s bandwidth per link for low-latency communication.[26] Each die in a DCM typically supports up to 12 cores, yielding a total of up to 24 cores per socket, which optimizes server density in rack-mounted systems without requiring additional sockets.[19] This modular approach allows POWER9-based systems like the S922 and S924 to scale to 20 or 24 cores across one or two sockets while maintaining efficient power and thermal management.[19] The packaging technology for both SCM and DCM employs a 7-2-7 layer organic substrate with flip-chip micro-bumps for die-to-substrate interconnections, ensuring high I/O density and signal integrity.[25] Micro-bumps facilitate reliable electrical and thermal paths, supporting advanced features like eight DDR4 memory channels per die. This configuration enables up to 2 TB of DDR4 memory per socket using 128 GB DIMMs across 16 slots (eight channels with dual DIMMs), providing substantial bandwidth for data-intensive workloads.[27] POWER9 modules include enterprise variants optimized for high-core-count SCMs in scale-up servers, such as the E980 with up to 12 active cores per die for demanding transactional processing.[2] In comparison, the CMG1 variant focuses on GPU-accelerated configurations, integrating NVLink 2.0 interfaces for coherent access to NVIDIA Volta GPUs in systems like the AC922, prioritizing AI and deep learning density over maximum core count.[28] These options allow tailored packaging for diverse implementations while leveraging the shared POWER9 architecture.

Implementations

IBM Enterprise Systems

IBM's enterprise systems based on POWER9 processors form the core of its Power Systems lineup, designed for high-performance computing in data-intensive environments. The Power System AC922, introduced in 2018, targets AI and high-performance computing (HPC) workloads, featuring two POWER9 processors with up to 44 cores total, up to 2 TB of DDR4 memory, and support for up to six NVIDIA Tesla V100 GPUs connected via NVLink 2.0 for accelerated deep learning tasks.[24] This 2U rack-mounted server emphasizes PCIe Gen4 I/O and OpenCAPI interfaces to handle large-scale data analytics and model training efficiently.[24] Building on this, the scale-out variants include the Power System S922 and S922L, launched in 2018, which provide flexible configurations for enterprise-scale deployments. The S922, a 2U system with up to two POWER9 sockets and 22 active cores, supports up to 4 TB of DDR4 memory and 11 PCIe Gen4 slots, making it suitable for database management and virtualization through PowerVM.[29] The S922L (also known as L922, model 9008-22L), optimized for Linux environments, extends this with up to 24 cores across two sockets and a focus on large memory footprints for in-memory databases, achieving up to 4 TB RAM to support analytics workloads. For midrange needs, the Power System E950, a 4U server announced in 2018, offers up to four POWER9 sockets with 48 cores and 16 TB of memory, ideal for consolidated enterprise applications such as healthcare systems like Epic, where it delivers reliable performance for virtualization and data processing.[4] At the high end, the Power System E980, announced in 2018, represents IBM's scale-up flagship with modular scalability up to four nodes, providing up to 192 POWER9 cores and 64 TB of DDR4 memory in a 22U configuration.[2] This system integrates advanced RAS features like dynamic processor sparing and supports up to 32 PCIe Gen4 slots for expansive I/O, enabling high-availability clustering with PowerHA for mission-critical databases and analytics.[2] Across these systems, POWER9 enables seamless integration with IBM Z mainframes in hybrid cloud architectures, allowing secure data sharing and workload portability between Power and Z environments for unified multicloud strategies.[30] Applications span transactional databases, real-time analytics, and AI inference, where the processors' high thread density and memory bandwidth accelerate tasks like pattern recognition in large datasets.[31] IBM began transitioning enterprise offerings to POWER10 processors in 2022, with POWER9-based systems like the E980 and E950 seeing reduced new shipments thereafter.[32] Standard service support for select POWER9 models, including the AC922, S922, and E980, extends until January 31, 2026, after which customers are encouraged to migrate to newer generations for ongoing maintenance and features.[33] These systems run AIX, IBM i, and Linux distributions, ensuring broad compatibility for enterprise software stacks.[2]

Third-Party and Specialized Systems

Raptor Computing Systems emerged as a prominent OpenPOWER partner by developing fully open-source hardware platforms centered on POWER9 processors. In 2018, the company released the Talos II, a dual-socket EATX workstation motherboard designed for security and performance, supporting up to two POWER9 CPUs in a PowerNV configuration without proprietary firmware.[34][35] This system emphasized auditable components from hardware to BMC firmware, appealing to users prioritizing transparency and customization. Complementing it, the Blackbird offered a more compact, single-socket variant for cost-effective POWER9 deployment, maintaining the open-source ethos while targeting developers and small-scale computing needs.[36][37] These platforms represented Raptor's commitment to free-software-friendly architectures, enabling widespread adoption in niche markets like secure workstations and embedded applications. Google and Rackspace collaborated on the Zaius server design as an open architecture for cloud environments, leveraging POWER9's capabilities for high-performance workloads. Announced in 2016 with draft specifications released later that year, Zaius integrated dual POWER9 scale-out processors with OpenCAPI and NVLink interconnects, adhering to Open Compute Project standards for efficient data center scalability.[38][39] Optimized for OpenStack deployments, the platform supported Rackspace's private cloud initiatives and Google's hyperscale requirements, facilitating accelerated computing in virtualized settings. By 2018, Google had confirmed POWER9 integrations in its data centers, underscoring Zaius's role in broadening OpenPOWER's cloud footprint.[40][41] Penguin Computing contributed to the OpenPOWER ecosystem with HPC-oriented systems incorporating POWER9, including variants in its Magna series based on reference designs like Barreleye. Launched around 2018, these servers targeted high-performance computing applications, offering configurations with liquid cooling options to handle dense GPU-accelerated workloads efficiently.[42][43] The Relion series extended this focus, providing flexible rack-mount solutions for enterprise HPC, with air and direct-to-chip liquid cooling to support sustained high-throughput operations in data centers.[44] Wistron developed specialized POWER9-based servers for diverse applications, including edge computing scenarios. The P93D2-2P (MiHawk), a 2U dual-socket system using scale-out POWER9 processors, supported up to high-core-count configurations for demanding edge and data processing tasks.[45] Certified under OpenPOWER Ready, this platform integrated PCIe Gen4 for enhanced I/O performance, making it suitable for low-latency environments like industrial IoT and distributed analytics.[46] Following IBM's withdrawal of POWER9 marketing in October 2023, many third-party systems entered end-of-support phases, with vendors like Raptor and Wistron providing limited extensions or migrations to POWER10 equivalents by 2026.[47][48]

Supercomputing Deployments

POWER9 processors played a pivotal role in advancing high-performance computing through their integration into large-scale supercomputer deployments, particularly those sponsored by the U.S. Department of Energy (DOE). The most prominent example is Summit, developed by IBM for the Oak Ridge National Laboratory (ORNL) and operational since 2018. Summit comprises 4,608 compute nodes, each equipped with two 22-core POWER9 CPUs clocked at 3.07 GHz and six NVIDIA Tesla V100 GPUs, delivering a theoretical peak performance of 200 petaFLOPS. This configuration enabled Summit to claim the title of the world's fastest supercomputer on the TOP500 list from June 2018 until June 2020, when it ranked second, and it maintained top-five positions through 2020. The system's NVLink interconnect facilitated high-bandwidth communication between POWER9 CPUs and GPUs, supporting diverse scientific workloads in areas such as climate modeling and materials science.[49][50][51] Summit was retired in November 2024.[52] A companion system, Sierra, deployed at Lawrence Livermore National Laboratory (LLNL) in 2018 under the DOE's CORAL program, shares a similar architecture tailored for simulation-intensive applications like nuclear stockpile stewardship. Sierra features 4,320 nodes, with each node including two 22-core POWER9 CPUs at 3.1 GHz and four NVIDIA V100 GPUs, achieving a peak performance of approximately 125 petaFLOPS. Like Summit, Sierra leveraged POWER9's capabilities for accelerated computing, ranking second on the TOP500 list from November 2018 to June 2020 and contributing to breakthroughs in astrophysics and energy research. These DOE systems exemplified POWER9's scalability in exascale precursor environments, paving the way for subsequent generations of HPC infrastructure.[53][54][55] Sierra was retired in November 2025.[56] Beyond Summit and Sierra, POWER9 powered several other notable supercomputing clusters that bolstered its presence in TOP500 rankings during 2018-2020, often occupying positions 2 through 5. For instance, systems like those at Japan's AIST and Italy's CINECA Marconi-100 utilized POWER9 with NVIDIA GPUs for AI and scientific simulations, reinforcing the processor's impact on global HPC landscapes. By 2023, however, the HPC field saw a shift toward newer architectures, including POWER10-based systems and HPE Cray platforms like Frontier, which superseded POWER9 deployments in performance leadership while highlighting the former's foundational role in achieving petaflop-scale computing. Summit's node-level configuration, with 44 cores per node from dual POWER9 chips, underscored the processor's density in enabling these transitions.[57][58]

Software Ecosystem

Operating System Support

IBM AIX provides full support for POWER9 processors, with specific optimizations introduced in version 7.2 Technology Level 2 (released in 2017) and subsequent releases, enabling compatibility with POWER9-based servers such as the Power System S914, S922, and S924.[59] AIX 7.1 Technology Level 5 Service Pack 2 also offers support for these systems, though later versions include enhanced POWER9 features like improved performance monitoring and security updates tailored to the architecture.[60] IBM i, IBM's integrated operating system for business applications, supports POWER9 hardware starting with version 7.2 Technology Refresh 8 and later, including Technology Refresh levels that enable deployment on Power Systems models like the S922 and E980.[12] Version 7.5 represents the final release with native POWER9 support, integrated for enterprise workloads on these platforms.[61] Several Linux distributions offer certified support for POWER9 via the ppc64le architecture, leveraging kernel-level compatibility that began with Linux kernel version 4.6 and matured in subsequent releases. Red Hat Enterprise Linux versions 7.4 and 8.x provide full support, including installation images and updates optimized for POWER9 servers such as the AC922 and E980.[62][63] Ubuntu 18.04 LTS and later versions are certified for POWER9, with long-term support extending to hardware like the Power System AC922.[64] SUSE Linux Enterprise Server 12 SP4 and 15 also support POWER9, with features like radix page tables and performance monitoring units enabled for these processors.[65][66] POWER9 hardware receives ongoing operating system support through at least January 31, 2026, after which standard IBM service ends for most models, though third-party Linux distributions may continue updates independently; post-POWER10 releases focus reduced enhancements on newer architectures.[67]

Compatibility and Optimization

The POWER9 processor implements the Power ISA version 3.0, a 64-bit architecture that includes the Vector Scalar Extension (VSX) for enhanced floating-point and vector operations, as well as the Vector Multimedia Extension (VMX) for SIMD processing, enabling advanced computational capabilities in scientific and AI workloads.[29][68] This ISA version maintains backward compatibility with prior generations, allowing software compiled for POWER8 systems—based on Power ISA 2.07—to execute on POWER9 hardware through dedicated processor compatibility modes such as POWER8 mode, which emulates the feature set of the earlier processor to ensure seamless operation without recompilation.[69][70] Software optimizations for POWER9 leverage its simultaneous multithreading (SMT) capability, which supports up to eight threads per core, through compiler flags in tools like the IBM XL compilers; for instance, the -qtune=power9 option directs the optimizer to exploit SMT modes for improved throughput in multithreaded applications, while suboptions like -qsmt=auto balance thread distribution across cores.[71][72] In AI and machine learning contexts, frameworks such as TensorFlow and PyTorch have been tuned via IBM's PowerAI toolkit to utilize NVLink 2.0 interconnects, providing high-bandwidth GPU acceleration—up to 25 GB/s in each direction (50 GB/s bidirectional)—resulting in significant performance gains for deep learning training compared to PCIe-based systems.[73][74] Development tools for POWER9 include the IBM Advance Toolchain, an open-source suite of compilers (e.g., GCC variants), runtime libraries, and profilers optimized for Power ISA 3.0 features, facilitating efficient code generation and debugging on Linux environments.[75] The OpenPOWER SDK complements this with simulators and utilities, such as the POWER9 Functional Simulator, for pre-silicon validation and porting.[76] POWER9's support for little-endian mode in Linux distributions aligns it closely with x86 conventions, easing binary portability for many applications since POWER8.[77][62] Porting software from x86 architectures to POWER9 presents challenges, including recompilation to handle differences in instruction sets, vector intrinsics, and alignment requirements, though little-endian support mitigates endianness issues; developers often use tools like the Advance Toolchain to identify and resolve architecture-specific dependencies, such as 128-bit VSX registers versus x86's AVX.[78] Migration paths to POWER10 involve leveraging its POWER9 compatibility mode, which allows existing POWER9 binaries and applications to run without modification, supported by features like Live Partition Mobility for seamless transitions between systems.[79][70]

References

User Avatar
No comments yet.