Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Power10
Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, announced in August 2020 and available from September 2021. The processor is designed to have 15 cores available. The main features of Power10 are higher performance per watt and better memory and I/O architectures, with a focus on artificial intelligence (AI) workloads. Each Power10 core has doubled up on most functional units compared to its predecessor POWER9. Power10 is available in a range of IBM models and is supported by operating systems including Linux 5.9 and PowerVM. The branding is unusual in that its name is not capitalized like POWER9 and all other previous POWER processors.
The Power10 superscalar, multithreading, multi-core microprocessor family is based on the open source Power ISA. It was announced in August 2020 at the Hot Chips conference. Systems with Power10 CPUs were generally available from September 2021 in the IBM Power10 Enterprise E1080 server. The processor is designed to have 15 cores available, but a spare core will be included during manufacture to cost-effectively allow for yield issues. The main features of Power10 are higher performance per watt and better memory and I/O architectures, with a focus on artificial intelligence (AI) workloads.
Power10-based processors is manufactured by Samsung using a 7 nm process with 18 layers of metal and 18 billion transistors on a 602 mm2 silicon die.
Each Power10 core has doubled up on most functional units compared to its predecessor POWER9. The core is eight-way multithreaded (SMT8) and has 48 KB instruction and 32 KB data L1 caches, a 2 MB large L2 cache and a very large translation lookaside buffer (TLB) with 4096 entries. Latency cycles to the different cache stages and TLB has been reduced significantly. Each core has eight execution slices each with one floating-point unit (FPU), arithmetic logic unit (ALU), branch predictor, load–store unit and SIMD-engine, able to be fed 128-bit (64+64) instructions from the new prefix/fuse instructions of the Power ISA v.3.1. Each execution slice can handle 20 instructions each, backed up by a shared 512-entry instruction table, and fed to 128-entry-wide (64 single-threaded) load queue and 80-entry (40 single-threaded) wide store queue. Better branch prediction features have doubled the accuracy. A core has four matrix math assist (MMA) engines, for better handling of SIMD code, especially for matrix multiplication instructions where AI inference workloads have a 20-fold performance increase.
The processor has two "hemispheres" with eight cores each, sharing a 64 MB L3 cache for a total of 16 cores and 128 MB L3 caches. Due to yield issues, at least one core is always disabled, reducing L3 cache by 8 MB to a usable total of 15 cores and 120 MB L3 cache. Each chip also has eight crypto accelerators offloading common algorithms such as AES and SHA-3.
Increased clock gating and reworked microarchitecture at every stage, together with the fuse/prefix instructions enabling more work with fewer work units, and smarter cache with lower memory latencies and effective address tagging reducing cache misses, enables the Power10 core to consume half the power as POWER9. Combined with the improvements in the compute facilities by up to 30% makes the whole processor perform 2.6× better per watt than its predecessor. And in the case of mounting two cores on the same module, up to 3 times as fast in the same power budget.
As the cores can act like eight logical processors each of the 15-core processor looks like 120 cores to the operating system. On a dual-chip module, that becomes 240 simultaneous threads per socket.
The chips have completely reworked memory and I/O architectures, using the open Coherent Accelerator Processor Interface (OpenCAPI) and Open Memory Interface (OMI). Using serial memory communications to off chip controllers reduces signaling lanes to and from the chip, increases the bandwidth and allows the processor to be flexible in its memory technology.
Hub AI
Power10 AI simulator
(@Power10_simulator)
Power10
Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, announced in August 2020 and available from September 2021. The processor is designed to have 15 cores available. The main features of Power10 are higher performance per watt and better memory and I/O architectures, with a focus on artificial intelligence (AI) workloads. Each Power10 core has doubled up on most functional units compared to its predecessor POWER9. Power10 is available in a range of IBM models and is supported by operating systems including Linux 5.9 and PowerVM. The branding is unusual in that its name is not capitalized like POWER9 and all other previous POWER processors.
The Power10 superscalar, multithreading, multi-core microprocessor family is based on the open source Power ISA. It was announced in August 2020 at the Hot Chips conference. Systems with Power10 CPUs were generally available from September 2021 in the IBM Power10 Enterprise E1080 server. The processor is designed to have 15 cores available, but a spare core will be included during manufacture to cost-effectively allow for yield issues. The main features of Power10 are higher performance per watt and better memory and I/O architectures, with a focus on artificial intelligence (AI) workloads.
Power10-based processors is manufactured by Samsung using a 7 nm process with 18 layers of metal and 18 billion transistors on a 602 mm2 silicon die.
Each Power10 core has doubled up on most functional units compared to its predecessor POWER9. The core is eight-way multithreaded (SMT8) and has 48 KB instruction and 32 KB data L1 caches, a 2 MB large L2 cache and a very large translation lookaside buffer (TLB) with 4096 entries. Latency cycles to the different cache stages and TLB has been reduced significantly. Each core has eight execution slices each with one floating-point unit (FPU), arithmetic logic unit (ALU), branch predictor, load–store unit and SIMD-engine, able to be fed 128-bit (64+64) instructions from the new prefix/fuse instructions of the Power ISA v.3.1. Each execution slice can handle 20 instructions each, backed up by a shared 512-entry instruction table, and fed to 128-entry-wide (64 single-threaded) load queue and 80-entry (40 single-threaded) wide store queue. Better branch prediction features have doubled the accuracy. A core has four matrix math assist (MMA) engines, for better handling of SIMD code, especially for matrix multiplication instructions where AI inference workloads have a 20-fold performance increase.
The processor has two "hemispheres" with eight cores each, sharing a 64 MB L3 cache for a total of 16 cores and 128 MB L3 caches. Due to yield issues, at least one core is always disabled, reducing L3 cache by 8 MB to a usable total of 15 cores and 120 MB L3 cache. Each chip also has eight crypto accelerators offloading common algorithms such as AES and SHA-3.
Increased clock gating and reworked microarchitecture at every stage, together with the fuse/prefix instructions enabling more work with fewer work units, and smarter cache with lower memory latencies and effective address tagging reducing cache misses, enables the Power10 core to consume half the power as POWER9. Combined with the improvements in the compute facilities by up to 30% makes the whole processor perform 2.6× better per watt than its predecessor. And in the case of mounting two cores on the same module, up to 3 times as fast in the same power budget.
As the cores can act like eight logical processors each of the 15-core processor looks like 120 cores to the operating system. On a dual-chip module, that becomes 240 simultaneous threads per socket.
The chips have completely reworked memory and I/O architectures, using the open Coherent Accelerator Processor Interface (OpenCAPI) and Open Memory Interface (OMI). Using serial memory communications to off chip controllers reduces signaling lanes to and from the chip, increases the bandwidth and allows the processor to be flexible in its memory technology.
