Hubbry Logo
IBM TelumIBM TelumMain
Open search
IBM Telum
Community hub
IBM Telum
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
IBM Telum
IBM Telum
from Wikipedia
Telum
General information
Launched2021
Designed byIBM
Common manufacturer
Performance
Max. CPU clock rate5.2 GHz
Cache
L2 cache32 MB
per core
Architecture and classification
Technology node7 nm
Instruction setz/Architecture
Physical specifications
Cores
  • 8
History
Predecessorz15
SuccessorTelum II
Both sides of the Telum microprocessor

Telum is a microprocessor made by IBM for the IBM z16 series mainframe computers.[2][3] The processor was announced at the Hot Chips 2021 conference on 23 August 2021.[2] Telum is IBM's first processor that contains on-chip acceleration for artificial intelligence inferencing while a transaction is taking place.[clarification needed][4][5]

Description

[edit]

The chip contains 8 processor cores with a deep superscalar out-of-order pipeline, running with more than 5 GHz clock frequency which is optimized for the demands of heterogenous enterprise-class workloads (e.g: finance, security sensitive applications, applications requiring extreme reliability). The cache and chip-interconnection infrastructure provides 32 MB cache per core and can scale to 32 Telum chips.[6][3][7] The cache design has been described as "revolutionary" in 2021,[6] by creating a system where the L2 cache of one core can be used as virtual L3 and L4 caches for another core.[3][1] The Telum processor can either be water cooled or air cooled, but water cooling is required for running more than a few Telum processors in a single IBM compute drawer.[8][9] Unlike other processors, the IBM Telum does not thermal throttle by reducing clock speed; instead it inserts sleep state instructions.[8][9]

Telum adds a new, NNP-Data-Type-1 Format, 16-bit floating point format and several new instructions. The Neural Network Processing Assists (NNPA)[10] instruction performs a variety of tensor instructions useful for neural networks.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The is a (CPU) designed by for its Z mainframe computers and LinuxONE servers, first introduced in the IBM z16 system in 2022. Fabricated using a 7 nm process, it features eight high-performance cores operating above a 5 GHz clock speed, marking it as the first mainframe processor with integrated on-chip acceleration for (AI) inferencing directly within high-volume transactional workloads. This innovation enables real-time applications such as fraud detection and in sectors like banking and , processing up to 100,000 transactions per second with sub-millisecond latency. Telum's emphasizes , , and for enterprise environments, incorporating 32 MB of private Level-2 (L2) cache per core for a total of 256 MB, a 256 MB virtual Level-3 (L3) cache, and a 2 GB Level-4 (L4) cache—representing 1.5 times more cache per core than its predecessor, the z15's processor. The on-chip AI accelerator delivers over 6 teraflops (TFLOPs) of performance per chip, scaling to 200 TFLOPs across a full system, and supports models for inference without offloading data from the mainframe. Additional features include transparent encryption, Secure Execution environments for , and a redesigned eight-channel interface that enhances availability by tolerating failures in channels or dual in-line modules (DIMMs). In 2024, IBM announced the Telum II, a successor processor built on a using Foundry technology, featuring eight cores at 5.5 GHz, an improved AI accelerator, and an integrated (DPU) for low-latency I/O operations; it powers the IBM z17 mainframe released in June 2025. Telum II expands cache capacity to 360 MB L3 and 2.88 GB L4 per chip while maintaining the focus on AI-infused transactional processing, with further enhancements for generative AI workloads via the companion IBM Spyre Accelerator, a PCIe-based system-on-chip available since October 2025. These processors underscore 's commitment to embedding AI at the hardware level for mission-critical computing, prioritizing data privacy and performance in hybrid cloud environments.

Overview

Description

The IBM Telum processor is a 7 nm designed by for its z/Architecture-based mainframes, marking the company's first commercial chip to integrate on-chip acceleration for inferencing directly during . Introduced in August 2021 and powering the IBM z16 mainframe series, which became generally available in 2022, Telum represents a significant advancement in enterprise computing hardware. Telum plays a central role in enabling real-time AI analytics within high-volume transactional environments, particularly in sectors like and where rapid data processing is essential for detection and . By embedding AI capabilities at the processor level, it allows organizations to derive actionable insights from vast datasets without interrupting core business operations, supporting hybrid architectures and mission-critical workloads. Among its primary innovations, Telum incorporates AI acceleration to facilitate on-the-fly inferencing, quantum-safe cryptography to safeguard against emerging threats, and an optimized cache design to handle high-throughput enterprise tasks efficiently. These features build on predecessors like the z15 processor, enhancing and performance for modern data-intensive applications.

Key specifications

The IBM Telum processor is a high-performance designed for enterprise mainframe systems, featuring eight high-performance cores that enable robust processing capabilities for mission-critical workloads. Each core operates at clock speeds exceeding 5 GHz, with a maximum frequency of 5.2 GHz, allowing for efficient execution of complex transactions. Fabricated on a 7 nm process using (EUV) technology, the processor achieves a balance of density and performance suitable for large-scale computing environments. Key cache configurations include 32 MB of private L2 cache per core, a 256 MB virtual L3 cache shared across cores, and a 2 GB L4 cache, providing enhanced data access speeds and reduced latency for data-intensive applications. The processor die measures 530 mm² and incorporates approximately 22 billion transistors, supporting its advanced computational demands. Telum supports both air and options within its system packaging, enabling flexible deployment in various infrastructures without compromising performance. It maintains operational efficiency through dynamic sleep states that prevent thermal throttling by inserting idle instructions rather than reducing clock speeds, ensuring consistent high-frequency operation under load. The processor provides full compatibility with the instruction set, incorporating superscalar and to optimize instruction throughput and resource utilization.
SpecificationDetails
Core Count8 high-performance cores
Clock Speed>5 GHz (max 5.2 GHz)
Technology Node7 nm EUV
L2 Cache32 MB per core (private)
L3 Cache256 MB (virtual, shared)
L4 Cache2 GB (virtual)
~22 billion
Die Size530 mm²
Cooling SupportAir or
Instruction Set (superscalar, out-of-order)

Development

Announcement and release

IBM announced the Telum processor on , 2021, during a presentation at the Hot Chips 33 conference in . This event marked the public unveiling of Telum as the core component of the next-generation mainframe systems, with emphasizing its role in enabling real-time AI inferencing and enhanced security to support hybrid cloud environments. Telum evolved from the preceding z15 processors introduced in 2019, representing the first chip in IBM's family to integrate on-chip AI acceleration directly into the processor design. The processor was positioned to address enterprise demands for low-latency AI in mission-critical workloads, building on the z15's focus on data privacy and performance while introducing embedded AI capabilities. IBM formally introduced the z16 mainframe series, powered by Telum, on April 5, 2022, highlighting its advancements in AI-driven and quantum-safe . The z16 systems became generally available on May 31, 2022, marking Telum's commercial debut in production environments. Telum in z16 mainframes targets primarily the financial sector for applications such as high-throughput and real-time fraud detection, supporting billions of daily operations.

Design and manufacturing

The development of the IBM Telum processor began around 2015, with focused efforts on AI integration starting around 2019 through the establishment of IBM's AI Hardware Center. This timeline aligned with the need to evolve mainframe capabilities for enterprise workloads, culminating in the processor's unveiling in August 2021 and deployment in z16 systems by mid-2022. Central to the design goals was achieving clock frequencies exceeding 5 GHz across eight high-performance cores while embedding on-chip AI accelerators to enable low-latency directly within transactional processing, all without undermining the stringent reliability standards of mainframe environments. This approach aimed to support real-time AI insights at scales handling up to 100,000 transactions per second with sub-millisecond response times. Telum was fabricated using a 7 nm (EUV) process node at Foundry, incorporating custom intellectual property blocks for compatibility; the resulting dual-chip module spans 530 square millimeters and integrates 22.5 billion transistors across 19 miles of wiring and 17 metal layers. Key engineering challenges included managing in the densely packed eight-core configuration to sustain high frequencies, which was addressed through optimized thermal design and efficient accelerator integration. To ensure , the team implemented mechanisms such as L2 cache SRAM wipe-out correction and an eight-channel interface capable of tolerating DIMM or channel failures. An innovative cache was devised to reduce access latencies—targeting around 3.8 ns for L2 and 12 ns for L3—by allocating 32 MB of private L2 cache per core, forming larger virtual shared structures for improved data throughput. IBM's played a pivotal , with the leveraging in-house expertise from the AI Hardware Center and Systems teams, including custom (EDA) tools tailored to optimize z/ performance and streamline the integration of AI hardware. This collaborative effort within IBM enabled rapid iteration on custom IP, from core to accelerator embedding, while maintaining enterprise-grade resilience.

Architecture

Processor cores and pipeline

The Telum processor features eight symmetric cores per chip, each implementing a superscalar with out-of-order execution capable of issuing up to 10 instructions per cycle. This design is tailored to the demands of mainframe workloads, emphasizing high single-thread performance and efficient handling of complex under . The cores support advanced prefetching and low-latency fetch mechanisms to minimize stalls in data-intensive enterprise environments. The employs a deep out-of-order structure optimized for branch prediction accuracy and efficient load/store operations, enabling sustained throughput in transaction-heavy scenarios. It decodes up to six instructions per cycle and can initiate up to 12, with a focus on reducing latency through a flatter cache topology that integrates seamlessly with the core's execution flow. This architecture draws from generations of processor advancements, prioritizing reliability and predictability for mission-critical computing. Execution units within each core include multiple fixed-point units (FXUs) for integer operations, binary floating-point units (BFUs) and decimal floating-point units (DFUs) for scalar computations, and vector units supporting instructions such as vector fixed-point (VFX), (VXS), permute (VXP), and multiply (VXM). The floating-point capabilities extend to 16-bit (FP16) precision via the vector floating-point unit (VFU), facilitating AI-related tasks alongside traditional workloads, with two decimal floating-point accelerators per core for enhanced precision handling. These units operate on 32 vector registers, each 128 bits wide, allowing SIMD from 8-bit integers to full 128-bit operands. Simultaneous multithreading (SMT) is supported with up to two threads per core, dynamically sharing execution resources to boost utilization in mixed enterprise workloads, yielding an average 25% throughput improvement. reaches 5.2 GHz, with dynamic adjustments managed by the Intelligent Resource Director and Workload Manager to optimize power and performance without traditional throttling.

Cache hierarchy

The IBM Telum processor features a private L2 cache of 32 MB per core, implemented using high-speed SRAM to provide low-latency access for critical data and instructions. This design supports a 19-cycle load-use latency of approximately 3.8 ns, including TLB access, and incorporates four pipelines to handle overlapping fetch, store, and snoop traffic efficiently. Additionally, the L2 includes SRAM wipe-out error correction and sparing mechanisms to enhance reliability in enterprise environments. Telum's L3 and L4 caches are implemented as virtual structures in a distributed, coherent system that leverages excess capacity across multiple L2 caches, eliminating the need for traditional on-chip L3 or off-chip L4 hardware. The virtual L3 totals 256 MB per chip, formed by dynamically tagging and sharing underutilized portions of the eight per-core L2 caches via cooperative allocation among cores. Evicted lines from one core's L2 can be stored in another core's L2 as tagged L3 lines, maintaining coherence through retagging and dynamic sharing. Similarly, the virtual L4 provides 2 GB of capacity across up to eight chips in a , using spare virtual L3 space for spillover, which enables horizontal cache persistence and scalability without dedicated L4 structures. Coherence in this distributed hierarchy is managed by a custom protocol that ensures consistency across cores and chips, avoiding off-chip broadcasts until on-chip resolution is complete to minimize unnecessary traffic. The cores' L2 caches are interconnected via dual-direction rings supporting over 320 GB/s of bandwidth, facilitating efficient snoop and for the virtual L3. For multi-chip L4 access, a flat multi-chip fabric further reduces latency compared to prior generations like z15. This virtual cache design delivers approximately 1.5 times more effective cache capacity per core than predecessors while improving latencies for most workloads by eliminating frequent off-chip accesses. Average virtual L3 access latency is around 12 ns, providing consistent gains for I/O-intensive mainframe tasks. The high internal bandwidth and coherent distribution optimize speeds, contributing to over 40% per-socket uplift over z15 in enterprise scenarios.

Integrated accelerators

The IBM Telum processor integrates several specialized hardware accelerators to offload common computational tasks from the general-purpose cores, enabling efficient processing in enterprise environments. These accelerators are embedded directly on the chip, connected via a high-speed on-chip fabric that links them to the cores and for low-latency access. Central to Telum's design is the dedicated AI inferencing unit, which provides on-chip acceleration for inference during real-time transactions. This accelerator, implemented as a Neural Networks Processing Assist (NNPA) engine, supports memory-to-memory operations for models and delivers over 6 teraflops (TFLOPs) of compute capacity per processor unit (PU) chip. It features 128 tiles for 8-way FP-16 fused multiply-add (FMA) SIMD operations and 32 tiles for mixed FP-16/FP-32 matrix multiplications and activations, with internal bandwidth exceeding 200 GB/s for reads/stores and over 600 GB/s between processing engines. Every core can dynamically access this shared accelerator via a ring interface to the L1 and L2 caches, allowing seamless integration of AI workloads without data movement overhead. In a full 32-chip system, it scales to more than 200 TFLOPs, enabling low-latency for tasks like fraud detection. The compression accelerator, known as the Nest Accelerator Unit (NXU), handles data compression and decompression operations to optimize storage and I/O efficiency. Integrated as one unit per PU chip and tied to the L3 cache, it supports DEFLATE-compliant algorithms, CRC, and ZLIB Adler checksums, achieving up to 5% better compression ratios than previous external adapters for workloads like BSAM and VSAM. It operates in both synchronous and asynchronous modes with low latency and high bandwidth, serving all cores and logical partitions (LPARs) simultaneously, and replaces the need for dedicated PCIe-based zEDC Express adapters. Encryption offload is provided through the Central Processor Assist for Cryptographic Function (CPACF), a dedicated co-processor embedded in each core for hardware-accelerated cryptographic operations. CPACF supports symmetric ciphers such as AES-128/192/256, DES, and TDES, along with hashing algorithms including /2/3 and SHAKE, enabling pervasive of data in transit and at rest. These operations integrate directly with the core's execution pipeline, minimizing overhead for high-volume . Additional units include generators embedded within CPACF, comprising a pseudo- generator (PRNG) based on 3DES, a deterministic RNG (DRNG) using NIST SP-800-90A with SHA-512, and a true RNG (TRNG) for seeding cryptographic keys and nonces. For mainframe-specific tasks, Telum incorporates on-core data path accelerators and a Z Sort accelerator—one per core—to optimize sorting operations in utilities like DFSORT and Db2, reducing CPU cycles and elapsed time for in-memory sorts. The high-speed on-chip interconnect, including the M-Bus for intra-dual-chip module (DCM) communication at 160 Gbps and a ring interface for accelerator access, ensures efficient data flow between these units, cores, and the multi-level cache (L1 to L4).

Features

Artificial intelligence integration

The IBM Telum processor features on-chip AI acceleration via the Neural Network Processing Assist (NNPA), a set of architected instructions that enable embedded inferencing for transactional AI workloads, making it the first such processor to integrate this capability directly during transaction processing. This acceleration is provided by the Integrated Accelerator for AI (AIU), a dedicated unit shared across the processor's eight cores, which delivers over 6 TFLOPs of performance for neural network operations. The design allows for real-time AI insights without the need to offload computations to external GPUs, supporting high-throughput enterprise environments like financial transaction processing. Telum's NNPA supports efficient inference using a 16-bit floating-point format (DLFLOAT16), which balances precision and performance for models such as those employed in detection networks, minimizing accuracy loss while optimizing resource use. Performance benchmarks demonstrate significant gains in inference, depending on the model complexity. Additionally, the accelerator achieves sub-1 ms response times at the 99.9th percentile, enabling seamless integration into latency-sensitive applications. In practical workloads, Telum facilitates real-time analytics, including anomaly detection in financial transactions, where AI inferences occur inline to prevent fraud proactively rather than reactively. The software ecosystem enhances this through integration with IBM Watson Machine Learning for z/OS, allowing deployment of hybrid AI models that leverage both on-chip acceleration and broader cloud-based training. This setup supports z/OS environments for end-to-end AI pipelines, from model development to in-transaction execution, fostering scalable enterprise AI adoption.

Security and cryptography

The IBM Telum processor integrates quantum-safe to address emerging threats from , supporting post-quantum algorithms standardized by NIST, including CRYSTALS-Kyber for key encapsulation and CRYSTALS-Dilithium for digital signatures. These features are enabled through hardware-accelerated cryptographic functions in the Central Processor Assist for Cryptographic Functions (CPACF), allowing efficient implementation of lattice-based algorithms resistant to quantum attacks like . This integration represents a pioneering advancement, as Telum powers the z16, the industry's first mainframe system with quantum-safe protections embedded at the silicon level across firmware and hardware layers. Telum incorporates on-chip hardware security modules via CPACF and associated secure enclaves, providing isolated environments for , storage, and runtime operations. These enclaves, supported by Secure Execution technology, ensure that sensitive keys and data remain protected during processing, with keys never exposed in outside the hardware boundary. Master keys are managed through tamper-resistant Modules (HSMs) like the Crypto Express8S, which facilitate secure key entry and distribution across up to 85 logical partitions. Pervasive encryption in Telum extends to data in use, with transparent memory encryption safeguarding all data as it moves from processor chips to main memory, minimizing exposure in transient states. Tamper detection mechanisms, including hardware-based monitoring and response in CPACF and HSMs, trigger immediate key erasure upon detecting physical or logical intrusions, ensuring rapid recovery and data integrity. These protections enable secure, high-volume transaction processing in environments handling financial and personal data. Telum's design complies with Level 4 standards for its cryptographic modules, the highest certification for commercial hardware, validating robust protections for key management and in regulated sectors like banking and healthcare. This compliance supports seamless adherence to global regulations, allowing organizations to process encrypted workloads without performance degradation while mitigating risks from both classical and quantum adversaries.

Performance and reliability

The IBM Telum processor achieves significant throughput improvements through architectural enhancements, including a 1.5 times increase in cache capacity per core compared to the z15 processor, which enables faster in high-volume environments. This expanded cache—32 MB of L2 per core, virtualized to form 256 MB L3 and up to 2 GB L4—reduces latency and boosts per-thread performance, supporting systems capable of handling up to 25 billion encrypted (OLTP) transactions per day on a fully configured z16 mainframe. These optimizations prioritize sustained operation in mission-critical workloads, such as , where rapid response times are essential without compromising . Reliability in the Telum processor is fortified by multiple fault-tolerant mechanisms designed for continuous operation in enterprise settings. Caches at L2, L3, and L4 levels incorporate symbol error-correcting code (ECC) with RAID-4 parity, providing robust protection against multi-bit errors and enhancing data resilience across the hierarchy. The design includes redundant execution paths via the Redundant Array of Independent Memory (RAIM) technology, which uses an 8-channel Reed-Solomon configuration to tolerate full channel or failures with transparent recovery and reduced overhead compared to prior generations. Additionally, predictive (PFA) features enable preemptive isolation of potential issues, such as DRAM marking and processor unit (PU) sparing with two spares per system, allowing nondisruptive maintenance and minimizing downtime. Power efficiency is a core aspect of Telum's design, supporting high availability through advanced management techniques that maintain 99.999% uptime in demanding configurations. The 7 nm process node facilitates lower power draw per transistor, complemented by dynamic voltage scaling and sleep states that adjust core activity based on workload demands, preventing thermal throttling during peak loads. N+1 redundancy in power supplies and cooling systems, including closed-loop water cooling options, ensures reliable operation without interruptions, aligning with mainframe standards for fault-tolerant computing. In benchmarks, Telum demonstrates superior performance in transaction-oriented workloads, with internal IBM measurements showing up to 11% uniprocessor improvement over the z15 in single-threaded tasks and enhanced throughput in cache-intensive scenarios via Large System Performance Reference (LSPR) metrics adapted for mainframe environments. These results highlight its edge in benchmarks like TPC-E equivalents, where the processor's cache and optimizations yield higher under mixed loads. Telum's scalability supports deployment in large-scale systems through configurations, with each processor featuring eight cores that can interconnect across up to 32 chips in a drawer, enabling configurations of up to 200 processing units for expanded capacity in multi-node setups. This modular approach, including dual-chip modules for 16 cores per unit, allows seamless scaling for growing transaction volumes while preserving coherence and performance across the fabric.

Deployment

Integration in IBM Z systems

The IBM Telum processor serves as the core computing engine for the IBM z16 mainframe and the IBM LinuxONE Rockhopper 4 systems, enabling high-performance and in enterprise environments. These platforms support configurations with up to four dual-chip modules (DCMs) per central processing complex (CPC) drawer, where each DCM houses two Telum chips, allowing for scalable core counts up to 200 active processors across multi-drawer setups. This modular design facilitates efficient resource allocation and supports both multi-frame and single-frame deployments, including rack-mount options for space-constrained data centers. In terms of system architecture, Telum chips within each DCM are interconnected via a high-speed Mbus interface, delivering approximately 166 GB/s of bandwidth between the two chips to ensure low-latency data sharing and cohesive multi-core operation. CPC drawers are further linked through redundant high-speed communications fabrics, such as PCIe-based interconnects and coupling facilities, enabling seamless scalability across up to four drawers in a full configuration while maintaining system reliability and fault tolerance. This multi-chip module approach optimizes power efficiency and thermal management, aligning with IBM Z's emphasis on resilient, high-availability computing. The software stack surrounding Telum is tightly integrated with IBM's mainframe operating environments, including for mission-critical workloads, for and guest management, and certified Linux distributions such as and on . Optimization extends to AI model deployment, with tools like the and Deep Learning Compiler (DLC) enabling developers to compile and run models natively on Telum's on-chip accelerator, supporting frameworks such as and without data movement off-platform. These integrations ensure that AI-enhanced applications can leverage the full ecosystem for secure, real-time processing. I/O integration in z16 systems enhances network and storage acceleration through PCIe Generation 3 I/O infrastructure, with the processor supporting PCIe Generation 4 interfaces, supporting up to 12 I/O drawers with features like RoCE Express3 for (25 GbE/10 GbE) and FICON Express32S (up to 32 Gbps) for high-throughput storage access. On-chip accelerators, including the Integrated Accelerator for zEDC, offload compression and decompression tasks directly from Telum cores, reducing CPU overhead for data-intensive operations, while zHyperLink Express provides ultra-low-latency coupling to arrays. These elements collectively streamline data flows, minimizing latency in hybrid and transactional environments. The upgrade path from prior generations emphasizes , with z16 fully supporting z15 workloads, instructions, and peripherals, allowing organizations to migrate applications seamlessly via nondisruptive upgrades or logical partitioning without code changes. This compatibility extends to I/O configurations and software binaries, preserving investments in existing , , and environments while introducing Telum's new capabilities incrementally.

Applications and use cases

Telum-powered systems have found significant application in , where they enable real-time fraud detection and during banking transactions. By integrating on-chip AI acceleration, Telum allows for the analysis of high-value transactions as they occur, supporting use cases such as anti-money laundering, clearing and settlement, and processing. This capability shifts fraud management from reactive detection to proactive prevention, enhancing models for faster credit approvals and improved compliance with regulatory requirements. In healthcare and , Telum facilitates the secure processing of sensitive through AI-driven , particularly for prevention in claims and risk evaluation. For instance, it supports real-time analysis of claims using ensemble AI techniques that combine neural networks with traditional models, ensuring data privacy while accelerating . Potential extensions include applications in healthcare, where low-latency inferencing handles complex datasets without compromising . Government agencies and retail operations leverage Telum for high-volume transaction handling, including payment processing and management. In the , it powers mission-critical systems like tax processing, vehicle registrations, and benefits distribution, infusing AI directly into transactional workloads to generate real-time insights and boost operational efficiency. Retail environments benefit from its ability to manage peak transaction loads, such as during sales events, with on-chip AI optimizing tracking and customer interactions at scale. Deployments by major banks demonstrate tangible benefits, with Telum enabling up to 40% performance improvements per socket compared to prior systems, resulting in faster query responses for AI-augmented transactions—often achieving sub-millisecond latencies for scoring across billions of daily operations. For example, implementations like DXC Luxoft's UmbrellaFraud solution on Telum-based mainframes provide 100% transaction coverage for deep analysis, helping institutions save millions annually in potential losses. Broader impacts of Telum include enabling edge-to-cloud AI in hybrid environments, where it reduces latency for global enterprises by co-locating AI inferencing with data on systems integrated into hybrid cloud architectures. This supports seamless scalability across on-premises, cloud, and edge deployments, driving efficiency in industries reliant on real-time analytics.

Successors

Telum II processor

The IBM Telum II processor serves as the direct successor to the original Telum, representing a significant in mainframe architecture. Announced in August 2024 at the Hot Chips conference in , it was developed to power next-generation IBM Z systems, including the z17 mainframe, which became generally available in June 2025. The processor emphasizes enhanced performance for mission-critical workloads, particularly those involving , while maintaining with existing enterprise environments. Fabricated on Samsung's node, the Telum II features eight high-performance cores clocked at 5.5 GHz, each supported by 36 MB of L2 cache, for a total of ten 36 MB L2 caches across the chip. It includes a 40% expansion in on-chip cache capacity compared to its predecessor, with 360 MB of virtual L3 cache and 2.8 GB of virtual L4 cache to improve access in large-scale transactions. The design incorporates approximately 43 billion transistors and utilizes a unique virtual caching strategy that dynamically allocates L2 resources to minimize latency, enabling up to 20% higher socket performance and 15% lower power consumption. Key improvements in the Telum II include advanced branch prediction mechanisms, expanding rename registers from 128 to 160 for better instruction handling, and a 50% increase in AI inferencing performance through an upgraded on-chip accelerator delivering 24 . Additionally, it integrates a low-latency (DPU) with eight 5.5 GHz cores and a private 36 MB L2 cache, which accelerates I/O operations and reduces power usage by up to 70% for networking tasks. The overall architecture retains the instruction set but refines the execution pipeline for higher throughput in hybrid environments. Employing a dual-chip module with 24 miles of interconnect wire, the Telum II supports scalable configurations up to 32 processors in a coherent (SMP) system, along with 192 PCIe Gen5 interfaces for expanded I/O bandwidth. It is deployed in the z17 and the corresponding LinuxONE 5 systems, both optimized for generative AI applications such as real-time fraud detection and inferencing directly on transactional data. The Spyre Accelerator, announced in August 2024 and made commercially available in October 2025, represents a key evolution in mainframe AI capabilities extending from the Telum processor family. This PCIe Gen5-based system-on-a-chip features 32 AI accelerator cores, 128 GB of LPDDR5 memory, and operates within a 75W power envelope, enabling support for multimodal large language models (LLMs) on z17 systems. Fabricated on a 5nm process with 25.6 billion transistors, Spyre delivers over 300 of AI inference performance per card, allowing configurations of up to eight cards per I/O drawer for 1 TB total memory and scalable processing of generative and agentic AI workloads. Spyre integrates with the Telum II processor by offloading complex AI models from the main CPU, complementing the on-chip AI accelerator in Telum II to handle larger-scale tasks such as multi-model serving for LLMs. This pairing enhances low-latency AI in enterprise environments, with systems supporting up to 768 across 32 Telum II chips in a coherent SMP configuration. Unlike Telum's integrated AI approach, which focuses on real-time, transaction-embedded , Spyre serves as a dedicated external accelerator for high-throughput, flexible workloads, enabling enterprises to scale AI without overburdening core resources. IBM's broader mainframe AI roadmap incorporates software ecosystems like watsonx for Z systems, which leverages hardware such as Spyre to accelerate application development, modernization, and operations. Tools within watsonx, including watsonx Code Assistant for Z, use generative AI to refactor legacy code, automate testing, and support agentic workflows on mainframes, integrating seamlessly with Telum-derived processors for end-to-end AI governance and deployment. These advancements enable scalable AI deployment in mission-critical mainframes, facilitating secure, low-latency processing for applications like fraud detection and compliance while bridging toward quantum-resistant computing through built-in support for NIST-standardized in Telum II and compatible accelerators.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.