IBM Telum

IBM TelumMain

Community hub

IBM Telum

8 pages, 0 posts

0 subscribers

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something

About hubMembersContent overviewUpdatesRules

Main reference articles

IBM Telum

View on Wikipedia

from Wikipedia

Telum
General information
Launched	2021
Designed by	IBM
Common manufacturer	Samsung Electronics^[1]
Performance
Max. CPU clock rate	5.2 GHz
Cache
L2 cache	32 MB per core
Architecture and classification
Technology node	7 nm
Instruction set	z/Architecture
Physical specifications
Cores	8
History
Predecessor	z15
Successor	Telum II

Telum is a microprocessor made by IBM for the IBM z16 series mainframe computers.^[2]^[3] The processor was announced at the Hot Chips 2021 conference on 23 August 2021.^[2] Telum is IBM's first processor that contains on-chip acceleration for artificial intelligence inferencing while a transaction is taking place.^{[clarification needed]}^[4]^[5]

Description

[edit]

The chip contains 8 processor cores with a deep superscalar out-of-order pipeline, running with more than 5 GHz clock frequency which is optimized for the demands of heterogenous enterprise-class workloads (e.g: finance, security sensitive applications, applications requiring extreme reliability). The cache and chip-interconnection infrastructure provides 32 MB cache per core and can scale to 32 Telum chips.^[6]^[3]^[7] The cache design has been described as "revolutionary" in 2021,^[6] by creating a system where the L2 cache of one core can be used as virtual L3 and L4 caches for another core.^[3]^[1] The Telum processor can either be water cooled or air cooled, but water cooling is required for running more than a few Telum processors in a single IBM compute drawer.^[8]^[9] Unlike other processors, the IBM Telum does not thermal throttle by reducing clock speed; instead it inserts sleep state instructions.^[8]^[9]

Telum adds a new, NNP-Data-Type-1 Format, 16-bit floating point format and several new instructions. The Neural Network Processing Assists (NNPA)^[10] instruction performs a variety of tensor instructions useful for neural networks.

References

[edit]

^ ^a ^b Hudson, Andrew (24 July 2023). "The IBM mainframe: How it runs and why it survives". Ars Technica. Retrieved 25 July 2023.
^ ^a ^b Moorhead, Patrick (23 August 2021). "IBM Telum- A New Chapter In Vertically Integrated Chip Technology". Forbes. Retrieved 12 May 2023.
^ ^a ^b ^c Johnson, Dexter (29 April 2022). "IBM's New Telum Chip Reboots the Mainframe". IEEE Spectrum. Retrieved 5 May 2022.
^ Combs, Veronica (24 August 2021). "IBM's new Telum Processor is the company's first with an on-chip AI accelerator". TechRepublic. Archived from the original on 2023-03-22. Retrieved 27 August 2021.
^ Sperling, Ed (26 August 2021). "New Approaches For Processor Architectures". Semiconductor Engineering. Retrieved 25 July 2023.
^ ^a ^b Cutress, Ian (2 September 2021). "Did IBM Just Preview The Future of Caches?". Anandtech. Archived from the original on September 2, 2021. Retrieved 5 May 2022.
^ Sebastian, Linus (5 April 2022). "I Tried to Break a Million Dollar Computer - IBM Z16 Facility Tour!" (video). YouTube. Linus Media Group. Retrieved 5 May 2022.
^ ^a ^b IBM z16 Technical Introduction (PDF) (Second ed.). IBM. April 2023. SG24-8950-01.
^ ^a ^b Why Do Mainframes Still Exist? What's Inside One? 40TB, 200+ Cores, AI, and more!, 28 October 2023, retrieved 2024-01-11
^ "NEURAL NETWORK PROCESSING ASSIST" (PDF). z/Architecture Principles of Operation (PDF) (Fourteenth ed.). IBM. May 2022. pp. 26-61 – 26-96. SA22-7832-13. Retrieved March 31, 2025.

Revisions and contributors Edit on Wikipedia Read on Wikipedia

View on Grokipedia

from Grokipedia

The IBM Telum is a central processing unit (CPU) designed by IBM for its Z mainframe computers and LinuxONE servers, first introduced in the IBM z16 system in 2022.^[1] Fabricated using a 7 nm semiconductor process, it features eight high-performance cores operating above a 5 GHz clock speed, marking it as the first mainframe processor with integrated on-chip acceleration for artificial intelligence (AI) inferencing directly within high-volume transactional workloads.^[2] This innovation enables real-time applications such as fraud detection and risk assessment in sectors like banking and finance, processing up to 100,000 transactions per second with sub-millisecond latency.^[1] Telum's architecture emphasizes scalability, security, and efficiency for enterprise environments, incorporating 32 MB of private Level-2 (L2) cache per core for a total of 256 MB, a 256 MB virtual Level-3 (L3) cache, and a 2 GB Level-4 (L4) cache—representing 1.5 times more cache per core than its predecessor, the IBM z15's processor.^[3] The on-chip AI accelerator delivers over 6 teraflops (TFLOPs) of performance per chip, scaling to 200 TFLOPs across a full system, and supports deep learning models for inference without offloading data from the mainframe.^[4] Additional features include transparent memory encryption, Secure Execution environments for confidential computing, and a redesigned eight-channel memory interface that enhances availability by tolerating failures in channels or dual in-line memory modules (DIMMs).^[1] In 2024, IBM announced the Telum II, a successor processor built on a 5 nm process using Samsung Foundry technology, featuring eight cores at 5.5 GHz, an improved AI accelerator, and an integrated data processing unit (DPU) for low-latency I/O operations; it powers the IBM z17 mainframe released in June 2025.^[5] Telum II expands cache capacity to 360 MB L3 and 2.88 GB L4 per chip while maintaining the focus on AI-infused transactional processing, with further enhancements for generative AI workloads via the companion IBM Spyre Accelerator, a PCIe-based system-on-chip available since October 2025.^[6] These processors underscore IBM's commitment to embedding AI at the hardware level for mission-critical computing, prioritizing data privacy and performance in hybrid cloud environments.^[7]

Overview

Description

The IBM Telum processor is a 7 nm microprocessor designed by IBM for its z/Architecture-based mainframes, marking the company's first commercial chip to integrate on-chip acceleration for artificial intelligence inferencing directly during transaction processing.^[2]^[4] Introduced in August 2021 and powering the IBM z16 mainframe series, which became generally available in 2022, Telum represents a significant advancement in enterprise computing hardware.^[1]^[8] Telum plays a central role in enabling real-time AI analytics within high-volume transactional environments, particularly in sectors like finance and insurance where rapid data processing is essential for fraud detection and risk assessment.^[1]^[2] By embedding AI capabilities at the processor level, it allows organizations to derive actionable insights from vast datasets without interrupting core business operations, supporting hybrid cloud architectures and mission-critical workloads.^[9] Among its primary innovations, Telum incorporates AI acceleration to facilitate on-the-fly inferencing, quantum-safe cryptography to safeguard against emerging quantum computing threats, and an optimized cache design to handle high-throughput enterprise tasks efficiently.^[2]^[9] These features build on predecessors like the z15 processor, enhancing security and performance for modern data-intensive applications.^[10]

Key specifications

The IBM Telum processor is a high-performance microprocessor designed for enterprise mainframe systems, featuring eight high-performance cores that enable robust processing capabilities for mission-critical workloads.^[1] Each core operates at clock speeds exceeding 5 GHz, with a maximum frequency of 5.2 GHz, allowing for efficient execution of complex transactions.^[11] Fabricated on a 7 nm silicon lithography process using extreme ultraviolet (EUV) technology, the processor achieves a balance of density and performance suitable for large-scale computing environments.^[12] Key cache configurations include 32 MB of private L2 cache per core, a 256 MB virtual L3 cache shared across cores, and a 2 GB L4 cache, providing enhanced data access speeds and reduced latency for data-intensive applications.^[13] The processor die measures 530 mm² and incorporates approximately 22 billion transistors, supporting its advanced computational demands.^[11] Telum supports both air and water cooling options within its system packaging, enabling flexible deployment in various data center infrastructures without compromising performance.^[12] It maintains operational efficiency through dynamic sleep states that prevent thermal throttling by inserting idle instructions rather than reducing clock speeds, ensuring consistent high-frequency operation under load.^[4] The processor provides full compatibility with the z/Architecture instruction set, incorporating superscalar and out-of-order execution to optimize instruction throughput and resource utilization.^[2]

Specification	Details
Core Count	8 high-performance cores^[1]
Clock Speed	>5 GHz (max 5.2 GHz)^[11]
Technology Node	7 nm EUV^[12]
L2 Cache	32 MB per core (private)^[13]
L3 Cache	256 MB (virtual, shared)^[13]
L4 Cache	2 GB (virtual)^[13]
Transistor Count	~22 billion^[11]
Die Size	530 mm²^[11]
Cooling Support	Air or water cooling^[12]
Instruction Set	z/Architecture (superscalar, out-of-order)^[2]

Development

Announcement and release

IBM announced the Telum processor on August 23, 2021, during a presentation at the Hot Chips 33 conference in Stanford, California.^[2] This event marked the public unveiling of Telum as the core component of the next-generation IBM Z mainframe systems, with IBM emphasizing its role in enabling real-time AI inferencing and enhanced security to support hybrid cloud environments.^[2] Telum evolved from the preceding z15 processors introduced in 2019, representing the first chip in IBM's z/Architecture family to integrate on-chip AI acceleration directly into the processor design.^[4] The processor was positioned to address enterprise demands for low-latency AI in mission-critical workloads, building on the z15's focus on data privacy and performance while introducing embedded AI capabilities.^[1] IBM formally introduced the z16 mainframe series, powered by Telum, on April 5, 2022, highlighting its advancements in AI-driven transaction processing and quantum-safe cryptography.^[8] The z16 systems became generally available on May 31, 2022, marking Telum's commercial debut in production environments.^[8] Telum in z16 mainframes targets primarily the financial sector for applications such as high-throughput transaction processing and real-time fraud detection, supporting billions of daily operations.^[9]

Design and manufacturing

The development of the IBM Telum processor began around 2015, with focused efforts on AI integration starting around 2019 through the establishment of IBM's AI Hardware Center. This timeline aligned with the need to evolve mainframe capabilities for enterprise workloads, culminating in the processor's unveiling in August 2021 and deployment in z16 systems by mid-2022.^[14]^[4] Central to the design goals was achieving clock frequencies exceeding 5 GHz across eight high-performance cores while embedding on-chip AI accelerators to enable low-latency inference directly within transactional processing, all without undermining the stringent reliability standards of mainframe environments. This approach aimed to support real-time AI insights at scales handling up to 100,000 transactions per second with sub-millisecond response times.^[1]^[4] Telum was fabricated using a 7 nm extreme ultraviolet (EUV) process node at Samsung Foundry, incorporating custom IBM intellectual property blocks for z/Architecture compatibility; the resulting dual-chip module spans 530 square millimeters and integrates 22.5 billion transistors across 19 miles of wiring and 17 metal layers.^[4]^[15] Key engineering challenges included managing power density in the densely packed eight-core configuration to sustain high frequencies, which was addressed through optimized thermal design and efficient accelerator integration. To ensure fault tolerance, the team implemented mechanisms such as L2 cache SRAM wipe-out correction and an eight-channel memory interface capable of tolerating DIMM or channel failures. An innovative cache architecture was devised to reduce access latencies—targeting around 3.8 ns for L2 and 12 ns for L3—by allocating 32 MB of private L2 cache per core, forming larger virtual shared structures for improved data throughput.^[15]^[1] IBM's vertical integration played a pivotal role, with the design leveraging in-house expertise from the AI Hardware Center and Systems teams, including custom electronic design automation (EDA) tools tailored to optimize z/Architecture performance and streamline the integration of AI hardware. This collaborative effort within IBM enabled rapid iteration on custom IP, from core design to accelerator embedding, while maintaining enterprise-grade resilience.^[4]^[14]

Architecture

Processor cores and pipeline

The Telum processor features eight symmetric cores per chip, each implementing a superscalar microarchitecture with out-of-order execution capable of issuing up to 10 instructions per cycle.^[12] This design is tailored to the demands of mainframe workloads, emphasizing high single-thread performance and efficient handling of complex transaction processing under z/Architecture.^[4] The cores support advanced prefetching and low-latency fetch mechanisms to minimize stalls in data-intensive enterprise environments.^[12] The pipeline employs a deep out-of-order structure optimized for branch prediction accuracy and efficient load/store operations, enabling sustained throughput in transaction-heavy scenarios.^[4] It decodes up to six instructions per cycle and can initiate up to 12, with a focus on reducing latency through a flatter cache topology that integrates seamlessly with the core's execution flow.^[12] This pipeline architecture draws from generations of IBM Z processor advancements, prioritizing reliability and predictability for mission-critical computing.^[1] Execution units within each core include multiple fixed-point units (FXUs) for integer operations, binary floating-point units (BFUs) and decimal floating-point units (DFUs) for scalar computations, and vector units supporting z/Architecture instructions such as vector fixed-point (VFX), string (VXS), permute (VXP), and multiply (VXM).^[12] The floating-point capabilities extend to 16-bit (FP16) precision via the vector floating-point unit (VFU), facilitating AI-related tasks alongside traditional workloads, with two decimal floating-point accelerators per core for enhanced precision handling.^[12] These units operate on 32 vector registers, each 128 bits wide, allowing SIMD processing from 8-bit integers to full 128-bit operands.^[12] Simultaneous multithreading (SMT) is supported with up to two threads per core, dynamically sharing execution resources to boost utilization in mixed enterprise workloads, yielding an average 25% throughput improvement.^[12] Frequency scaling reaches 5.2 GHz, with dynamic adjustments managed by the Intelligent Resource Director and Workload Manager to optimize power and performance without traditional throttling.^[12]

Cache hierarchy

The IBM Telum processor features a private L2 cache of 32 MB per core, implemented using high-speed SRAM to provide low-latency access for critical data and instructions.^[15] This design supports a 19-cycle load-use latency of approximately 3.8 ns, including TLB access, and incorporates four pipelines to handle overlapping fetch, store, and snoop traffic efficiently.^[15] Additionally, the L2 includes SRAM wipe-out error correction and sparing mechanisms to enhance reliability in enterprise environments.^[15] Telum's L3 and L4 caches are implemented as virtual structures in a distributed, coherent system that leverages excess capacity across multiple L2 caches, eliminating the need for traditional on-chip L3 or off-chip L4 hardware.^[16] The virtual L3 totals 256 MB per chip, formed by dynamically tagging and sharing underutilized portions of the eight per-core L2 caches via cooperative allocation among cores.^[3] Evicted lines from one core's L2 can be stored in another core's L2 as tagged L3 lines, maintaining coherence through retagging and dynamic sharing.^[17] Similarly, the virtual L4 provides 2 GB of capacity across up to eight chips in a multi-chip module, using spare virtual L3 space for spillover, which enables horizontal cache persistence and scalability without dedicated L4 structures.^[3]^[15] Coherence in this distributed hierarchy is managed by a custom protocol that ensures consistency across cores and chips, avoiding off-chip broadcasts until on-chip resolution is complete to minimize unnecessary traffic.^[18] The cores' L2 caches are interconnected via dual-direction rings supporting over 320 GB/s of bandwidth, facilitating efficient snoop and data sharing for the virtual L3.^[15] For multi-chip L4 access, a flat multi-chip fabric further reduces latency compared to prior generations like z15.^[15] This virtual cache design delivers approximately 1.5 times more effective cache capacity per core than predecessors while improving latencies for most workloads by eliminating frequent off-chip accesses.^[15] Average virtual L3 access latency is around 12 ns, providing consistent performance gains for I/O-intensive mainframe tasks.^[16] The high internal bandwidth and coherent distribution optimize transaction processing speeds, contributing to over 40% per-socket performance uplift over z15 in enterprise scenarios.^[15]

Integrated accelerators

The IBM Telum processor integrates several specialized hardware accelerators to offload common computational tasks from the general-purpose cores, enabling efficient processing in enterprise environments. These accelerators are embedded directly on the chip, connected via a high-speed on-chip fabric that links them to the cores and cache hierarchy for low-latency access.^[12] Central to Telum's design is the dedicated AI inferencing unit, which provides on-chip acceleration for neural network inference during real-time transactions. This accelerator, implemented as a Neural Networks Processing Assist (NNPA) engine, supports memory-to-memory operations for deep learning models and delivers over 6 teraflops (TFLOPs) of compute capacity per processor unit (PU) chip. It features 128 tiles for 8-way FP-16 fused multiply-add (FMA) SIMD operations and 32 tiles for mixed FP-16/FP-32 matrix multiplications and activations, with internal bandwidth exceeding 200 GB/s for reads/stores and over 600 GB/s between processing engines. Every core can dynamically access this shared accelerator via a ring interface to the L1 and L2 caches, allowing seamless integration of AI workloads without data movement overhead. In a full 32-chip system, it scales to more than 200 TFLOPs, enabling low-latency inference for tasks like fraud detection.^[12]^[19]^[1] The compression accelerator, known as the Nest Accelerator Unit (NXU), handles data compression and decompression operations to optimize storage and I/O efficiency. Integrated as one unit per PU chip and tied to the L3 cache, it supports DEFLATE-compliant algorithms, GZIP CRC, and ZLIB Adler checksums, achieving up to 5% better compression ratios than previous external adapters for workloads like BSAM and VSAM. It operates in both synchronous and asynchronous modes with low latency and high bandwidth, serving all cores and logical partitions (LPARs) simultaneously, and replaces the need for dedicated PCIe-based zEDC Express adapters.^[12] Encryption offload is provided through the Central Processor Assist for Cryptographic Function (CPACF), a dedicated co-processor embedded in each core for hardware-accelerated cryptographic operations. CPACF supports symmetric ciphers such as AES-128/192/256, DES, and TDES, along with hashing algorithms including SHA-1/2/3 and SHAKE, enabling pervasive encryption of data in transit and at rest. These operations integrate directly with the core's execution pipeline, minimizing overhead for high-volume transaction processing.^[12]^[1] Additional units include random number generators embedded within CPACF, comprising a pseudo-random number generator (PRNG) based on 3DES, a deterministic RNG (DRNG) using NIST SP-800-90A with SHA-512, and a true RNG (TRNG) for seeding cryptographic keys and nonces. For mainframe-specific tasks, Telum incorporates on-core data path accelerators and a Z Sort accelerator—one per core—to optimize sorting operations in utilities like DFSORT and Db2, reducing CPU cycles and elapsed time for in-memory sorts. The high-speed on-chip interconnect, including the M-Bus for intra-dual-chip module (DCM) communication at 160 Gbps and a ring interface for accelerator access, ensures efficient data flow between these units, cores, and the multi-level cache (L1 to L4).^[12]

Features

Artificial intelligence integration

The IBM Telum processor features on-chip AI acceleration via the Neural Network Processing Assist (NNPA), a set of architected instructions that enable embedded inferencing for transactional AI workloads, making it the first such processor to integrate this capability directly during transaction processing.^[2] This acceleration is provided by the Integrated Accelerator for AI (AIU), a dedicated unit shared across the processor's eight cores, which delivers over 6 TFLOPs of performance for neural network operations.^[20] The design allows for real-time AI insights without the need to offload computations to external GPUs, supporting high-throughput enterprise environments like financial transaction processing.^[21] Telum's NNPA supports efficient inference using a 16-bit floating-point format (DLFLOAT16), which balances precision and performance for models such as those employed in fraud detection networks, minimizing accuracy loss while optimizing resource use.^[22] Performance benchmarks demonstrate significant gains in inference, depending on the model complexity.^[23] Additionally, the accelerator achieves sub-1 ms response times at the 99.9th percentile, enabling seamless integration into latency-sensitive applications.^[24] In practical workloads, Telum facilitates real-time analytics, including anomaly detection in financial transactions, where AI inferences occur inline to prevent fraud proactively rather than reactively.^[2] The software ecosystem enhances this through integration with IBM Watson Machine Learning for z/OS, allowing deployment of hybrid AI models that leverage both on-chip acceleration and broader cloud-based training.^[25] This setup supports z/OS environments for end-to-end AI pipelines, from model development to in-transaction execution, fostering scalable enterprise AI adoption.^[26]

Security and cryptography

The IBM Telum processor integrates quantum-safe cryptography to address emerging threats from quantum computing, supporting post-quantum algorithms standardized by NIST, including CRYSTALS-Kyber for key encapsulation and CRYSTALS-Dilithium for digital signatures.^[27]^[8] These features are enabled through hardware-accelerated cryptographic functions in the Central Processor Assist for Cryptographic Functions (CPACF), allowing efficient implementation of lattice-based algorithms resistant to quantum attacks like Shor's algorithm.^[20] This integration represents a pioneering advancement, as Telum powers the IBM z16, the industry's first mainframe system with quantum-safe protections embedded at the silicon level across firmware and hardware layers.^[10]^[28] Telum incorporates on-chip hardware security modules via CPACF and associated secure enclaves, providing isolated environments for key generation, storage, and runtime encryption operations.^[27] These enclaves, supported by IBM Z Secure Execution technology, ensure that sensitive keys and data remain protected during processing, with keys never exposed in plaintext outside the hardware boundary. Master keys are managed through tamper-resistant Hardware Security Modules (HSMs) like the Crypto Express8S, which facilitate secure key entry and distribution across up to 85 logical partitions.^[27]^[29] Pervasive encryption in Telum extends to data in use, with transparent memory encryption safeguarding all data as it moves from processor chips to main memory, minimizing exposure in transient states.^[20] Tamper detection mechanisms, including hardware-based monitoring and response in CPACF and HSMs, trigger immediate key erasure upon detecting physical or logical intrusions, ensuring rapid recovery and data integrity.^[27] These protections enable secure, high-volume transaction processing in environments handling financial and personal data. Telum's design complies with FIPS 140-2 Level 4 standards for its cryptographic modules, the highest certification for commercial hardware, validating robust protections for key management and encryption in regulated sectors like banking and healthcare.^[27]^[30] This compliance supports seamless adherence to global regulations, allowing organizations to process encrypted workloads without performance degradation while mitigating risks from both classical and quantum adversaries.^[28]

Performance and reliability

The IBM Telum processor achieves significant throughput improvements through architectural enhancements, including a 1.5 times increase in cache capacity per core compared to the z15 processor, which enables faster transaction processing in high-volume environments.^[12] This expanded cache—32 MB of L2 per core, virtualized to form 256 MB L3 and up to 2 GB L4—reduces latency and boosts per-thread performance, supporting systems capable of handling up to 25 billion encrypted online transaction processing (OLTP) transactions per day on a fully configured IBM z16 mainframe.^[31] These optimizations prioritize sustained operation in mission-critical workloads, such as financial services, where rapid response times are essential without compromising data integrity. Reliability in the Telum processor is fortified by multiple fault-tolerant mechanisms designed for continuous operation in enterprise settings. Caches at L2, L3, and L4 levels incorporate symbol error-correcting code (ECC) with RAID-4 parity, providing robust protection against multi-bit errors and enhancing data resilience across the hierarchy.^[12] The design includes redundant execution paths via the Redundant Array of Independent Memory (RAIM) technology, which uses an 8-channel Reed-Solomon configuration to tolerate full channel or DIMM failures with transparent recovery and reduced overhead compared to prior generations.^[12] Additionally, predictive failure analysis (PFA) features enable preemptive isolation of potential issues, such as DRAM marking and processor unit (PU) sparing with two spares per system, allowing nondisruptive maintenance and minimizing downtime.^[12] Power efficiency is a core aspect of Telum's design, supporting high availability through advanced management techniques that maintain 99.999% uptime in demanding configurations. The 7 nm process node facilitates lower power draw per transistor, complemented by dynamic voltage scaling and sleep states that adjust core activity based on workload demands, preventing thermal throttling during peak loads.^[1] N+1 redundancy in power supplies and cooling systems, including closed-loop water cooling options, ensures reliable operation without interruptions, aligning with mainframe standards for fault-tolerant computing.^[12] In benchmarks, Telum demonstrates superior performance in transaction-oriented workloads, with internal IBM measurements showing up to 11% uniprocessor improvement over the z15 in single-threaded z/OS tasks and enhanced throughput in cache-intensive scenarios via Large System Performance Reference (LSPR) metrics adapted for mainframe environments.^[12] These results highlight its edge in online transaction processing benchmarks like TPC-E equivalents, where the processor's cache and pipeline optimizations yield higher transactions per second under mixed loads.^[1] Telum's scalability supports deployment in large-scale systems through multi-chip module configurations, with each processor featuring eight cores that can interconnect across up to 32 chips in a drawer, enabling configurations of up to 200 processing units for expanded capacity in multi-node setups.^[12] This modular approach, including dual-chip modules for 16 cores per unit, allows seamless scaling for growing transaction volumes while preserving coherence and performance across the fabric.^[12]

Deployment

Integration in IBM Z systems

The IBM Telum processor serves as the core computing engine for the IBM z16 mainframe and the IBM LinuxONE Rockhopper 4 systems, enabling high-performance transaction processing and analytics in enterprise environments. These platforms support configurations with up to four dual-chip modules (DCMs) per central processing complex (CPC) drawer, where each DCM houses two Telum chips, allowing for scalable core counts up to 200 active processors across multi-drawer setups. This modular design facilitates efficient resource allocation and supports both multi-frame and single-frame deployments, including rack-mount options for space-constrained data centers.^[32] In terms of system architecture, Telum chips within each DCM are interconnected via a high-speed Mbus interface, delivering approximately 166 GB/s of bandwidth between the two chips to ensure low-latency data sharing and cohesive multi-core operation. CPC drawers are further linked through redundant high-speed communications fabrics, such as PCIe-based interconnects and coupling facilities, enabling seamless scalability across up to four drawers in a full configuration while maintaining system reliability and fault tolerance. This multi-chip module approach optimizes power efficiency and thermal management, aligning with IBM Z's emphasis on resilient, high-availability computing.^[33] The software stack surrounding Telum is tightly integrated with IBM's mainframe operating environments, including z/OS for mission-critical workloads, z/VM for virtualization and guest management, and certified Linux distributions such as Red Hat Enterprise Linux and Ubuntu on IBM Z. Optimization extends to AI model deployment, with tools like the zDNN library and Deep Learning Compiler (DLC) enabling developers to compile and run inference models natively on Telum's on-chip accelerator, supporting frameworks such as TensorFlow and PyTorch without data movement off-platform. These integrations ensure that AI-enhanced applications can leverage the full ecosystem for secure, real-time processing.^[12]^[23] I/O integration in z16 systems enhances network and storage acceleration through PCIe Generation 3 I/O infrastructure, with the processor supporting PCIe Generation 4 interfaces, supporting up to 12 I/O drawers with features like RoCE Express3 for RDMA over Converged Ethernet (25 GbE/10 GbE) and FICON Express32S (up to 32 Gbps) for high-throughput storage access. On-chip accelerators, including the Integrated Accelerator for zEDC, offload compression and decompression tasks directly from Telum cores, reducing CPU overhead for data-intensive operations, while zHyperLink Express provides ultra-low-latency coupling to external storage arrays. These elements collectively streamline data flows, minimizing latency in hybrid cloud and transactional environments.^[12]^[20] The upgrade path from prior generations emphasizes backward compatibility, with z16 fully supporting z15 workloads, instructions, and peripherals, allowing organizations to migrate applications seamlessly via nondisruptive upgrades or logical partitioning without code changes. This compatibility extends to I/O configurations and software binaries, preserving investments in existing z/OS, z/VM, and Linux environments while introducing Telum's new capabilities incrementally.^[12]^[20]

Applications and use cases

Telum-powered systems have found significant application in financial services, where they enable real-time fraud detection and risk assessment during banking transactions. By integrating on-chip AI acceleration, Telum allows for the analysis of high-value transactions as they occur, supporting use cases such as anti-money laundering, trade clearing and settlement, and loan processing.^[2]^[4] This capability shifts fraud management from reactive detection to proactive prevention, enhancing machine learning models for faster credit approvals and improved compliance with regulatory requirements.^[2] In healthcare and insurance, Telum facilitates the secure processing of sensitive data through AI-driven analytics, particularly for fraud prevention in claims and risk evaluation. For instance, it supports real-time analysis of insurance claims using ensemble AI techniques that combine neural networks with traditional models, ensuring data privacy while accelerating decision-making.^[2]^[5] Potential extensions include genomics applications in healthcare, where low-latency inferencing handles complex datasets without compromising security.^[4] Government agencies and retail operations leverage Telum for high-volume transaction handling, including payment processing and inventory management. In the public sector, it powers mission-critical systems like tax processing, vehicle registrations, and benefits distribution, infusing AI directly into transactional workloads to generate real-time insights and boost operational efficiency.^[34] Retail environments benefit from its ability to manage peak transaction loads, such as during sales events, with on-chip AI optimizing inventory tracking and customer interactions at scale.^[35] Deployments by major banks demonstrate tangible benefits, with Telum enabling up to 40% performance improvements per socket compared to prior systems, resulting in faster query responses for AI-augmented transactions—often achieving sub-millisecond latencies for fraud scoring across billions of daily operations.^[36]^[37] For example, implementations like DXC Luxoft's UmbrellaFraud solution on Telum-based mainframes provide 100% transaction coverage for deep fraud analysis, helping institutions save millions annually in potential losses.^[38] Broader impacts of Telum include enabling edge-to-cloud AI in hybrid environments, where it reduces latency for global enterprises by co-locating AI inferencing with data on IBM Z systems integrated into hybrid cloud architectures.^[4]^[34] This supports seamless scalability across on-premises, cloud, and edge deployments, driving efficiency in industries reliant on real-time analytics.^[39]

Successors

Telum II processor

The IBM Telum II processor serves as the direct successor to the original Telum, representing a significant evolution in mainframe computing architecture. Announced in August 2024 at the Hot Chips conference in Palo Alto, California, it was developed to power next-generation IBM Z systems, including the z17 mainframe, which became generally available in June 2025.^[5]^[40] The processor emphasizes enhanced performance for mission-critical workloads, particularly those involving artificial intelligence, while maintaining backward compatibility with existing enterprise environments. Fabricated on Samsung's 5 nm process node, the Telum II features eight high-performance cores clocked at 5.5 GHz, each supported by 36 MB of L2 cache, for a total of ten 36 MB L2 caches across the chip. It includes a 40% expansion in on-chip cache capacity compared to its predecessor, with 360 MB of virtual L3 cache and 2.8 GB of virtual L4 cache to improve data access efficiency in large-scale transactions. The design incorporates approximately 43 billion transistors and utilizes a unique virtual caching strategy that dynamically allocates L2 resources to minimize latency, enabling up to 20% higher socket performance and 15% lower power consumption.^[41]^[42] Key improvements in the Telum II include advanced branch prediction mechanisms, expanding rename registers from 128 to 160 for better instruction handling, and a 50% increase in AI inferencing performance through an upgraded on-chip accelerator delivering 24 TOPS. Additionally, it integrates a low-latency Data Processing Unit (DPU) with eight 5.5 GHz cores and a private 36 MB L2 cache, which accelerates I/O operations and reduces power usage by up to 70% for networking tasks. The overall architecture retains the z/Architecture instruction set but refines the execution pipeline for higher throughput in hybrid cloud environments.^[41]^[40] Employing a dual-chip module design with 24 miles of interconnect wire, the Telum II supports scalable configurations up to 32 processors in a coherent symmetric multiprocessing (SMP) system, along with 192 PCIe Gen5 interfaces for expanded I/O bandwidth. It is deployed in the IBM z17 and the corresponding LinuxONE 5 systems, both optimized for generative AI applications such as real-time fraud detection and large language model inferencing directly on transactional data.^[43]^[44]^[45] The IBM Spyre Accelerator, announced in August 2024 and made commercially available in October 2025, represents a key evolution in mainframe AI capabilities extending from the Telum processor family. This PCIe Gen5-based system-on-a-chip features 32 AI accelerator cores, 128 GB of LPDDR5 memory, and operates within a 75W power envelope, enabling support for multimodal large language models (LLMs) on IBM z17 systems. Fabricated on a 5nm process with 25.6 billion transistors, Spyre delivers over 300 TOPS of AI inference performance per card, allowing configurations of up to eight cards per I/O drawer for 1 TB total memory and scalable processing of generative and agentic AI workloads.^[46]^[41]^[47] Spyre integrates with the Telum II processor by offloading complex AI models from the main CPU, complementing the on-chip AI accelerator in Telum II to handle larger-scale inference tasks such as multi-model serving for LLMs. This pairing enhances low-latency AI processing in enterprise environments, with systems supporting up to 768 TOPS across 32 Telum II chips in a coherent SMP configuration. Unlike Telum's integrated AI approach, which focuses on real-time, transaction-embedded inference, Spyre serves as a dedicated external accelerator for high-throughput, flexible workloads, enabling enterprises to scale AI without overburdening core processing resources.^[5]^[41]^[46] IBM's broader mainframe AI roadmap incorporates software ecosystems like watsonx for Z systems, which leverages hardware such as Spyre to accelerate application development, modernization, and operations. Tools within watsonx, including watsonx Code Assistant for Z, use generative AI to refactor legacy code, automate testing, and support agentic workflows on mainframes, integrating seamlessly with Telum-derived processors for end-to-end AI governance and deployment.^[48]^[49] These advancements enable scalable AI deployment in mission-critical mainframes, facilitating secure, low-latency processing for applications like fraud detection and compliance while bridging toward quantum-resistant computing through built-in support for NIST-standardized post-quantum cryptography in Telum II and compatible accelerators.^[50]^[5]

History

Media collections

IBM Telum

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

IBM Telum

Description

See also

References

IBM Telum

Overview

Description

Key specifications

Development

Announcement and release

Design and manufacturing

Architecture

Processor cores and pipeline

Cache hierarchy

Integrated accelerators

Features

Artificial intelligence integration

Security and cryptography

Performance and reliability

Deployment

Integration in IBM Z systems

Applications and use cases

Successors

Telum II processor

Related technologies

References

Add your contribution

Related Hubs

Contribute something