High Bandwidth Memory
High Bandwidth Memory
Main page

High Bandwidth Memory

logo
Community Hub0 subscribers
Read side by side
from Wikipedia

High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs[1] and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers (such as the NEC SX-Aurora TSUBASA and Fujitsu A64FX).[2] The first HBM memory chip was produced by SK Hynix in 2013,[3] and the first devices to use HBM were the AMD Fiji GPUs in 2015.[4][5]

HBM was adopted by JEDEC as an industry standard in October 2013.[6] The second generation, HBM2, was accepted by JEDEC in January 2016.[7] JEDEC officially announced the HBM3 standard on January 27, 2022,[8] and the HBM4 standard in April 2025.[9][10]

Technology

[edit]
Type Release max data
rate speed
per pin
Stack per Stack
max capacity max data rate
HBM 1 Oct 2013 1.0 Gb/s 8×128 bit 4 dies × 1 GB = 4 GB 128 GB/s
HBM 2 Jan 2016 2.4 Gb/s 8 dies × 1 GB = 8 GB 307 GB/s
HBM 2E Aug 2019 3.6 Gb/s 12 dies × 2 GB = 24 GB 461 GB/s
HBM 3 Jan 2022 6.4 Gb/s 16×64 bit 12 dies × 2 GB = 24 GB 819 GB/s
HBM 3E May 2023 9.8 Gb/s 16 dies × 3 GB = 48 GB 1229 GB/s
HBM 4 April 2025 8 Gb/s 32×64 bit 16 dies × 4 GB = 64 GB 2048 GB/s

HBM achieves higher bandwidth than DDR4 or GDDR5 while using less power, and in a substantially smaller form factor.[11] This is achieved by stacking up to eight DRAM dies and an optional base die which can include buffer circuitry and test logic.[12] The stack is often connected to the memory controller on a GPU or CPU through a substrate, such as a silicon interposer.[13][14] Alternatively, the memory die could be stacked directly on the CPU or GPU chip. Within the stack the dies are vertically interconnected by through-silicon vias (TSVs) and microbumps. The HBM technology is similar in principle but incompatible with the Hybrid Memory Cube (HMC) interface developed by Micron Technology.[15]

HBM memory bus is very wide in comparison to other DRAM memories such as DDR4 or GDDR5. An HBM stack of four DRAM dies (4‑Hi) has two 128‑bit channels per die for a total of 8 channels and a width of 1024 bits in total. A graphics card/GPU with four 4‑Hi HBM stacks would therefore have a memory bus with a width of 4096 bits. In comparison, the bus width of GDDR memories is 32 bits, with 16 channels for a graphics card with a 512‑bit memory interface.[16] HBM supports up to 4 GB per package.

The larger number of connections to the memory, relative to DDR4 or GDDR5, required a new method of connecting the HBM memory to the GPU (or other processor).[17] AMD and Nvidia have both used purpose-built silicon chips, called interposers, to connect the memory and GPU. This interposer has the added advantage of requiring the memory and processor to be physically close, decreasing memory paths. However, as semiconductor device fabrication is significantly more expensive than printed circuit board manufacture, this adds cost to the final product.

Interface

[edit]
Cut through a graphics card that uses High Bandwidth Memory. See through-silicon vias (TSV).

The HBM DRAM is tightly coupled to the host compute die with a distributed interface. The interface is divided into independent channels. The channels are completely independent of one another and are not necessarily synchronous to each other. The HBM DRAM uses a wide-interface architecture to achieve high-speed, low-power operation. The HBM DRAM uses a 500 MHz differential clock CK_t / CK_c (where the suffix "_t" denotes the "true", or "positive", component of the differential pair, and "_c" stands for the "complementary" one). Commands are registered at the rising edge of CK_t, CK_c. Each channel interface maintains a 128‑bit data bus operating at double data rate (DDR). HBM supports transfer rates of 1 GT/s per pin (transferring 1 bit), yielding an overall package bandwidth of 128 GB/s.[18]

HBM2

[edit]

The second generation of High Bandwidth Memory, HBM2, also specifies up to eight dies per stack and doubles pin transfer rates up to 2 GT/s. Retaining 1024‑bit wide access, HBM2 is able to reach 256 GB/s memory bandwidth per package. The HBM2 spec allows up to 8 GB per package. HBM2 is predicted to be especially useful for performance-sensitive consumer applications such as virtual reality.[19]

On January 19, 2016, Samsung announced early mass production of HBM2, at up to 8 GB per stack.[20][21] SK Hynix also announced availability of 4 GB stacks in August 2016.[22]

HBM2E

[edit]

In late 2018, JEDEC announced an update to the HBM2 specification, providing for increased bandwidth and capacities.[23] Up to 307 GB/s per stack (2.5 Tbit/s effective data rate) is now supported in the official specification, though products operating at this speed had already been available. Additionally, the update added support for 12‑Hi stacks (12 dies) making capacities of up to 24 GB per stack possible.

On March 20, 2019, Samsung announced their Flashbolt HBM2E, featuring eight dies per stack, a transfer rate of 3.2 GT/s, providing a total of 16 GB and 410 GB/s per stack.[24]

August 12, 2019, SK Hynix announced their HBM2E, featuring eight dies per stack, a transfer rate of 3.6 GT/s, providing a total of 16 GB and 460 GB/s per stack.[25][26] On July 2, 2020, SK Hynix announced that mass production has begun.[27]

In October 2019, Samsung announced their 12-layered HBM2E.[28]

HBM3

[edit]

In late 2020, Micron unveiled that the HBM2E standard would be updated and alongside that they unveiled the next standard known as HBMnext (later renamed to HBM3). This was to be a big generational leap from HBM2 and the replacement to HBM2E. This new VRAM would have come to the market in the Q4 of 2022. This would likely introduce a new architecture as the naming suggests.

While the architecture might be overhauled, leaks pointed to performance similar to the updated HBM2E standard. This RAM was likely to be used mostly in data center GPUs.[29][30][31][32]

In mid 2021, SK Hynix unveiled some specifications of the HBM3 standard, with 5.2 Gbit/s I/O speeds and bandwidth of 665 GB/s per package, as well as up to 16-high 2.5D and 3D solutions.[33][34]

On 20 October 2021, before the JEDEC standard for HBM3 was finalised, SK Hynix was the first memory vendor to announce that it has finished development of HBM3 memory devices. According to SK Hynix, the memory would run as fast as 6.4 Gbit/s/pin, double the data rate of JEDEC-standard HBM2E, which formally tops out at 3.2 Gbit/s/pin, or 78% faster than SK Hynix's own 3.6 Gbit/s/pin HBM2E. The devices support a data transfer rate of 6.4 Gbit/s and therefore a single HBM3 stack may provide a bandwidth of up to 819 GB/s. The basic bus widths for HBM3 remain unchanged, with a single stack of memory being 1024-bits wide. SK Hynix would offer their memory in two capacities: 16 GB and 24 GB, aligning with 8-Hi and 12-Hi stacks respectively. The stacks consist of 8 or 12 16 Gb DRAMs that are each 30 μm thick and interconnected using Through Silicon Vias (TSVs).[35][36][37]

According to Ryan Smith of AnandTech, the SK Hynix first generation HBM3 memory has the same density as their latest-generation HBM2E memory, meaning that device vendors looking to increase their total memory capacities for their next-generation parts would need to use memory with 12 dies/layers, up from the 8 layer stacks they typically used until then.[35] According to Anton Shilov of Tom's Hardware, high-performance compute GPUs or FPGAs typically use four or six HBM stacks, so with SK Hynix's HBM3 24 GB stacks they would accordingly get 3.2 TB/s or 4.9 TB/s of memory bandwidth. He also noted that SK Hynix's HBM3 chips are square, not rectangular like HBM2 and HBM2E chips.[36] According to Chris Mellor of The Register, with JEDEC not yet having developed its HBM3 standard, might mean that SK Hynix would need to retrofit its design to a future and faster one.[37]

JEDEC officially announced the HBM3 standard on January 27, 2022.[8] The number of memory channels was doubled from 8 channels of 128 bits with HBM2e to 16 channels of 64 bits with HBM3. Therefore, the total number of data pins of the interface is still 1024.[38]

In June 2022, SK Hynix announced they started mass production of industry's first HBM3 memory to be used with Nvidia's H100 GPU expected to ship in Q3 2022. The memory will provide H100 with "up to 819 GB/s" of memory bandwidth.[39]

In August 2022, Nvidia announced that its "Hopper" H100 GPU will ship with five active HBM3 sites (out of six on board) offering 80 GB of RAM and 3 TB/s of memory bandwidth (16 GB and 600 GB/s per site).[40]

HBM3E

[edit]

On 30 May 2023, SK Hynix unveiled its HBM3E memory with 8 Gbit/s/pin data processing speed (25% faster than HBM3), which is to enter production in the first half of 2024.[41] At 8 GT/s with 1024-bit bus, its bandwidth per stack is increased from 819.2 GB/s as in HBM3 to 1 TB/s.

On 26 July 2023, Micron announced its HBM3E memory with 9.6 Gbit/s/pin data processing speed (50% faster than HBM3).[42] Micron HBM3E memory is a high-performance HBM that uses 1β DRAM process technology and advanced packaging to achieve the highest performance, capacity and power efficiency in the industry. It can store 24 GB per 8-high cube and allows data transfer at 1.2 TB/s. There will be a 12-high cube with 36 GB capacity in 2024.

In August 2023, Nvidia announced a new version of their GH200 Grace Hopper superchip that utilizes 141 GB (144 GiB physical) of HBM3e over a 6144-bit bus providing 50% higher memory bandwidth and 75% higher memory capacity over the HBM3 version.[43]

In May 2023, Samsung announced HBM3P with up to 7.2 Gbit/s which will be in production in 2024.[44]

On October 20, 2023, Samsung announced their HBM3E "Shinebolt" with up to 9.8 Gbit/s memory.[45]

On February 26, 2024, Micron announced the mass production of Micron's HBM3E memory.[46]

On March 18, 2024, Nvidia announced the Blackwell series of GPUs using HBM3E memory[47]

On March 19, 2024, SK Hynix announced the mass production of SK Hynix's HBM3E memory.[48]

In September 2024, SK Hynix announced the mass production of its 12-layered HBM3E memory[49] and in November the 16-layered version.[50]

HBM-PIM

[edit]

In February 2021, Samsung announced the development of HBM with processing-in-memory (PIM). This new memory brings AI computing capabilities inside the memory, to increase the large-scale processing of data. A DRAM-optimised AI engine is placed inside each memory bank to enable parallel processing and minimise data movement. Samsung claims this will deliver twice the system performance and reduce energy consumption by more than 70%, while not requiring any hardware or software changes to the rest of the system.[51]

HBM4

[edit]

In July 2024, JEDEC announced its preliminary specifications for future HBM4.[52] It lowered data rate per pin back to 6.4 Gbit/s/pin (the level of HBM3) but since it now employs a 2048-bit interface per stack (doubling that of the previous generations), it still achieves greater (1.6TB/s)[53] data rate per stack than that of HBM3E. Additionally, it will allow 4GB layers (yielding 64GB in 16-layer configurations).

In April 2025, JEDEC released the official HBM4 specification.[9] According to Kunal Khullar of Tom's Hardware, it supports transfer speeds of up to 8 Gb/s across a 2048-bit interface, with total bandwidth of up to 2 TB/s, and stack height of 4 to 16, with DRAM die densities of 24Gb or 32Gb, allowing for capacities up to 64GB. HBM4 is backwards compatible to HBM3 controllers. Samsung, Micron, and SK hynix contributed to the standard's formulation.[10]

History

[edit]

Background

[edit]

Die-stacked memory was initially commercialized in the flash memory industry. Toshiba introduced a NAND flash memory chip with eight stacked dies in April 2007,[54] followed by Hynix Semiconductor introducing a NAND flash chip with 24 stacked dies in September 2007.[55]

3D-stacked random-access memory (RAM) using through-silicon via (TSV) technology was commercialized by Elpida Memory, which developed the first 8 GB DRAM chip (stacked with four DDR3 SDRAM dies) in September 2009, and released it in June 2011. In 2011, SK Hynix introduced 16 GB DDR3 memory (40 nm class) using TSV technology,[3] Samsung Electronics introduced 3D-stacked 32 GB DDR3 (30 nm class) based on TSV in September, and then Samsung and Micron Technology announced TSV-based Hybrid Memory Cube (HMC) technology in October.[56]

JEDEC first released the JESD229 standard for Wide IO memory,[57] the predecessor of HBM featuring four 128 bit channels with single data rate clocking, in December 2011 after several years of work. The first HBM standard JESD235 followed in October 2013.

Development

[edit]
AMD Fiji, the first GPU to use HBM

The development of High Bandwidth Memory began at AMD in 2008 to solve the problem of ever-increasing power usage and form factor of computer memory. Over the next several years, AMD developed procedures to solve die-stacking problems with a team led by Senior AMD Fellow Bryan Black.[58] To help AMD realize their vision of HBM, they enlisted partners from the memory industry, particularly Korean company SK Hynix,[58] which had prior experience with 3D-stacked memory,[3][55] as well as partners from the interposer industry (Taiwanese company UMC) and packaging industry (Amkor Technology and ASE).[58]

The development of HBM was completed in 2013, when SK Hynix built the first HBM memory chip.[3] HBM was adopted as industry standard JESD235 by JEDEC in October 2013, following a proposal by AMD and SK Hynix in 2010.[6] High volume manufacturing began at a Hynix facility in Icheon, South Korea, in 2015.

The first GPU utilizing HBM was the AMD Fiji which was released in June 2015 powering the AMD Radeon R9 Fury X.[4][59][60]

In January 2016, Samsung Electronics began early mass production of HBM2.[20][21] The same month, HBM2 was accepted by JEDEC as standard JESD235a.[7] The first GPU chip utilizing HBM2 is the Nvidia Tesla P100 which was officially announced in April 2016.[61][62]

In June 2016, Intel released a family of Xeon Phi processors with 8 stacks of HCDRAM, Micron's version of HBM. At Hot Chips in August 2016, both Samsung and Hynix announced a new generation HBM memory technologies.[63][64] Both companies announced high performance products expected to have increased density, increased bandwidth, and lower power consumption. Samsung also announced a lower-cost version of HBM under development targeting mass markets. Removing the buffer die and decreasing the number of TSVs lowers cost, though at the expense of a decreased overall bandwidth (200 GB/s).

Nvidia announced Nvidia Hopper H100 GPU, the world's first GPU utilizing HBM3 on March 22, 2022.[65]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
High Bandwidth Memory (HBM) is a high-performance dynamic random-access memory (DRAM) technology that employs a 3D-stacked architecture with through-silicon vias (TSVs) to deliver exceptionally high bandwidth and low power consumption compared to traditional DRAM interfaces like DDR or GDDR.[1][2] Standardized by JEDEC under specifications such as JESD235 for HBM and JESD235A for HBM2, it features a wide-interface design with multiple independent channels—typically eight channels of 128 bits each for a total 1024-bit bus—operating at double data rate (DDR) speeds to achieve bandwidths up to several terabytes per second per stack.[1][3] Originating from a collaboration between AMD and SK Hynix, the first HBM prototypes were developed in 2013 to address memory bandwidth bottlenecks in graphics processing units (GPUs), with SK Hynix producing the initial chips that year.[4] JEDEC formally adopted the HBM standard in October 2013, and the technology debuted commercially in AMD's Fiji-series GPUs in 2015, marking the first widespread use of 3D-stacked memory in consumer hardware.[2] Evolution continued with HBM2 in 2016, enhancing capacity and efficiency; HBM2E in 2020, offering up to 3.6 Gbps per pin and 460 GB/s bandwidth; HBM3 in 2022, with 6.4 Gbps speeds and on-die error correction for AI workloads; HBM3E in 2023, extending speeds to 9.6 Gbps for over 1.2 TB/s bandwidth in AI systems; and HBM4 finalized by JEDEC in April 2025, introducing architectural improvements for even higher bandwidth and power efficiency in next-generation systems, supporting 24 Gb or 32 Gb DRAM die densities in 4-high to 16-high stack configurations with a maximum capacity per stack of 64 GB (16-high with 32 Gb dies); in February 2026, Samsung began mass production and commercial shipments of HBM4, achieving up to 3.3 TB/s bandwidth per stack with pin speeds of 11.7 Gbps (scalable to 13 Gbps), with initial commercial products offering up to 48 GB per stack (likely 16-high with 24 Gb dies)[5][6]; concurrently, Micron reported high-volume production and customer shipments of HBM4 ahead of schedule (a quarter early), with yields on track, pin speeds exceeding 11 Gbps, and its entire 2026 supply sold out; reports indicate Samsung has experienced 1c DRAM yields around 60% with lower effective yields after back-end processing, while SK Hynix, holding approximately 60% of Nvidia's HBM4 allocation, has faced difficulties achieving 11 Gbps-class performance amid capacity and yield constraints, suggesting Micron's stronger yield stability and production ramp-up[7][8]; HBM4E, an enhanced variant announced in late 2025 by manufacturers such as Samsung, SK Hynix, and Micron, offering speeds up to 13 Gbps per pin and bandwidths of up to 3.25 TB/s per stack for advanced AI applications; and as of 2026, development of HBM5 is underway under JEDEC's JC-42.2 High Bandwidth Memory Subcommittee, with industry preparations for subsequent generations such as HBM6 including new wide TC bonders.[4][9][10][11][12][13][14] HBM's defining advantages stem from its tightly coupled integration with host processors via silicon interposers or advanced packaging, enabling low-latency data transfer ideal for bandwidth-intensive applications.[3] It excels in GPUs for graphics rendering, high-performance computing (HPC) simulations, and artificial intelligence (AI) training/inference, where parallel processing demands massive data throughput—such as in NVIDIA's AI accelerators and supercomputers—while consuming less power per bit than alternatives like GDDR6.[2][15] The expansion of AI data centers has caused explosive demand for HBM and other high-value memory products, with AI applications projected to consume approximately 20% of global DRAM wafer capacity by 2026 and HBM3E prioritized for hyperscalers, leading to production shortages and significant price surges in early 2026.[16] Memory manufacturers prioritize HBM production due to its higher profitability in AI and GPU applications, leading companies like Samsung, SK Hynix, and Micron to shift production capacities from standard DRAM to HBM, which limits output for consumer uses and contributes to price increases in conventional memory.[17][18][19][20] On February 13, 2026, following news of Samsung beginning mass production and shipments of next-generation HBM4 memory chips, major memory chip stocks showed mixed performance: Micron Technology (MU) declined 1.79% (-$7.39) to $406.58 (intraday during market open at 10:49 AM EST; previous close $413.97), Samsung Electronics (005930.KS) rose 1.46% (+2,600 KRW) to 181,200 KRW (closed at 3:30 PM KST), and SK Hynix (000660.KS) fell 0.90% (-8,000 KRW) to 880,000 KRW (closed at 3:30 PM KST). These movements illustrate competitive market dynamics in the HBM sector.[21][22][23] As AI and HPC demands surge, HBM's market is projected to expand significantly, driven by its role in enabling efficient handling of large datasets in multi-core environments.[4][24] The rising demand for HBM, driven by AI applications, benefits semiconductor packaging companies through surging orders for HBM packaging, capacity utilization exceeding 90%, full production lines, and growth in advanced packaging technologies like XDFOI for high-density integration, enabling high demand elasticity, strong pricing power, and improved gross margins as a key midstream link in the supply chain.[25][26][27]

Overview

Definition and Purpose

High Bandwidth Memory (HBM) is a high-speed memory interface standard for 3D-stacked synchronous dynamic random-access memory (SDRAM), designed to deliver exceptional data throughput in performance-critical systems.[28] Developed as a collaborative effort among industry leaders, HBM integrates multiple DRAM dies vertically using through-silicon vias (TSVs) to form compact stacks, enabling a wide interface that connects directly to processors via interposers.[2] This architecture was formalized by the JEDEC Solid State Technology Association in October 2013 through the JESD235 standard, aiming to overcome the bandwidth constraints of conventional memory technologies amid escalating demands from compute-intensive applications.[2] The primary purpose of HBM is to alleviate the memory bandwidth bottleneck in traditional DRAM configurations, where narrow buses and longer signal paths limit data transfer rates for parallel processing tasks.[28] By providing ultra-high data rates—reaching up to terabytes per second—HBM supports workloads such as graphics rendering, machine learning inference, and scientific simulations that require massive parallel data access.[2] It is particularly suited for graphics processing units (GPUs) and specialized accelerators, where rapid data movement between memory and compute cores is essential for maintaining efficiency in high-performance computing environments.[28] At its core, the 3D stacking approach in HBM minimizes latency by shortening interconnect distances between memory layers and the host die, while simultaneously boosting density to pack more capacity into a smaller footprint without increasing the overall system size.[2] This vertical integration contrasts with planar memory layouts, allowing for wider channels that enhance throughput without relying solely on transistor scaling. The 2013 JEDEC standardization was motivated by the need to extend bandwidth growth beyond the limitations of Moore's Law in traditional semiconductor scaling, fostering innovations in die-stacking to meet the evolving requirements of GPUs and accelerators in data-parallel applications.[2]

Key Features and Benefits

High Bandwidth Memory (HBM) employs a wide bus interface, typically featuring 1024-bit channels in earlier generations and up to 2048-bit channels in advanced variants, enabling significantly higher data throughput compared to narrower bus architectures like those in traditional DRAM.[29] This design is facilitated by through-silicon vias (TSVs), which provide high-density vertical interconnects between stacked DRAM dies, minimizing signal path lengths and supporting efficient 3D integration.[30] Additionally, HBM incorporates a base logic die that handles functions such as test logic and can integrate error correction mechanisms, enhancing reliability in high-performance environments.[30] The primary benefits of HBM stem from its architecture, delivering up to 1-2 TB/s of bandwidth per stack, which represents 2-5 times the performance of GDDR6 in comparable GPU configurations.[29][2] This elevated bandwidth supports demanding applications like AI training and high-performance computing by reducing memory bottlenecks. Power efficiency is another key advantage, with energy consumption around 4-5 pJ/bit for transfers, lower than conventional graphics memories due to reduced capacitance and optimized signaling.[31] HBM's scalability allows for multi-stack configurations, enabling systems to aggregate bandwidth across up to eight stacks for total throughputs exceeding 10 TB/s while maintaining a compact footprint.[29][2] Packaging efficiency in HBM is achieved through the use of silicon interposers in 2.5D assemblies, which facilitate direct, high-speed connections between the memory stack and logic dies, and emerging hybrid bonding techniques that enable bumpless, fine-pitch interconnections for improved density and thermal management.[30][29] However, HBM incurs a significantly higher cost per bit than standard DDR DRAM due to its complex manufacturing, though this premium is justified for bandwidth-intensive, premium applications where space and power savings outweigh the expense.[30][2]

Architecture

Stacked Design and Components

High Bandwidth Memory (HBM) employs a vertical stacking architecture to integrate multiple dynamic random-access memory (DRAM) dies, ranging from 4 layers in early generations to up to 16 layers in HBM4, depending on the generation and capacity requirements, atop a base logic die within a compact 3D integrated circuit (IC) package.[32][33] These DRAM dies are interconnected using through-silicon vias (TSVs), which provide high-density vertical electrical pathways, with approximately 5,000 TSVs per layer handling signals, power, and ground distribution.[33] The base logic die, positioned at the bottom of the stack, serves as a buffer for data interfacing with the host processor and supports error-correcting code (ECC) functionality through dedicated parity bits, while optional integration of controller logic can be incorporated to manage memory operations.[33][30] The stacking relies on micro-bump connections, featuring arrays of up to 6,303 bumps with a 55 μm pitch, to ensure reliable interlayer bonding and signal integrity between dies.[33] For off-chip connectivity, the HBM stack mounts onto a silicon interposer in a 2.5D/3D IC packaging configuration, which routes high-speed signals to the processor while minimizing latency and enabling dense integration.[30][34] This design achieves high memory density, with capacities scaling through increased die layers and larger per-die capacities. In HBM4, stacks support 4-high to 16-high configurations with DRAM die densities of 24 Gb or 32 Gb, enabling a maximum capacity of 64 GB per stack using a 16-high configuration with 32 Gb dies (16 × 32 Gbit = 512 Gbit = 64 GB). As of February 2026, initial commercial HBM4 products (e.g., from Samsung) offer up to 48 GB per stack, likely using 16-high stacks with 24 Gb dies. Capacity per stack is determined by die density and stack height, as no public details exist on row, column, or bank configurations for die-level capacity calculation. The approximate density scaling follows the relation $ D \approx N_{\text{dies}} \times C_{\text{die}} $, where $ D $ is total stack density, $ N_{\text{dies}} $ is the number of DRAM dies, and $ C_{\text{die}} $ is the capacity per die; however, thermal dissipation constraints currently limit $ N_{\text{dies}} $ to 12–16 to prevent overheating within the standard stack height of approximately 775 μm. Industry reports from March 2026 indicate that JEDEC is considering relaxing the HBM thickness standard to 825–900 μm to enable mass production of 20-layer stacked configurations in future generations such as HBM4E and HBM5.[35][32][36][5][37] In TSV fabrication, dielectric liners isolate the copper-filled vias, with advanced processes incorporating high-k materials to reduce parasitic capacitance and improve electrical performance across the stack.[38] Thermal management is addressed through integrated heat spreaders and thermal vias or dummy bumps, which distribute heat evenly from the densely packed dies to the package lid, mitigating hotspots that could degrade reliability.[39][40] Yield challenges in stacking arise from defect propagation across layers, necessitating known good die (KGD) testing at interim stages to verify functionality before assembly, achieving yields above 98% in mature processes.[30][41] In HBM4, the base die can be customized for advanced features like integrated power management and UCIe interfaces, while hybrid bonding may replace micro-bumps for pitches below 10 μm in future implementations.[36]

Interface and Data Transfer

High Bandwidth Memory (HBM) employs a wide interface architecture standardized by JEDEC, featuring a data bus of 1024 bits in HBM1-HBM3 (divided into 8 channels of 128 bits or 16 channels of 64 bits) and 2048 bits in HBM4 (32 channels), with each channel supporting 128-bit or narrower sub-divisions depending on the generation. This design utilizes single-ended signaling augmented by a reference voltage (VREF) for pseudo-differential operation, which enhances noise rejection while minimizing pin count and power. Receivers incorporate PVT-tolerant techniques, such as adaptive equalization and voltage referencing, to maintain signal integrity across process variations, supply voltage fluctuations, and temperature extremes.[9] The data transfer protocol in HBM separates the command and address buses, with dedicated row address (RA) and column address (CA) lines that allow simultaneous issuance of row activation and column access commands for improved efficiency. Burst length is 2 clock cycles (BL2), transferring 256 bits per 128-bit channel (or 128 bits per 64-bit channel in HBM3) in a single burst to optimize throughput for high-demand workloads. Refresh operations are tailored for the stacked die structure, supporting per-bank or targeted refresh modes that reduce overhead compared to all-bank refreshes in traditional DRAM, thereby preserving availability in multi-die configurations.[42][43] Bandwidth in HBM is determined by the formula:
Bandwidth (GB/s)=data rate per pin (Gbps)×total pins across channels8 \text{Bandwidth (GB/s)} = \frac{\text{data rate per pin (Gbps)} \times \text{total pins across channels}}{8}
This equation converts the aggregate bit-rate to bytes per second, where the division by 8 accounts for 8 bits per byte; for instance, a 2 Gbps per pin rate across 1024 pins (HBM1-HBM3) yields 256 GB/s, or across 2048 pins (HBM4) yields 512 GB/s.[29] To ensure signal integrity over the short, high-density interconnects, HBM implements on-die termination (ODT) with dynamic calibration, applying resistive termination at the receiver to match driver impedance and suppress reflections. Timing benefits from direct die-to-die paths via through-silicon vias (TSVs), enabling low-latency intra-stack operations with typical access latencies around 100 ns, benefiting from short die-to-die paths. The stacked design's proximity enables these low-latency paths.[44]

Generations

HBM1

High Bandwidth Memory 1 (HBM1) represents the first generation of the HBM standard, formalized by the Joint Electron Device Engineering Council (JEDEC) under JESD235 in October 2013.[45] This specification introduced a high-performance DRAM architecture designed for applications requiring substantial data throughput, such as graphics processing units (GPUs). HBM1 stacks utilized through-silicon vias (TSVs) to interconnect multiple DRAM dies vertically, enabling a compact form factor with enhanced bandwidth compared to traditional planar DRAM configurations. The initial commercial production of HBM1 was achieved by SK Hynix in 2013, marking the debut of TSV-based stacking in mass-produced DRAM devices.[4] The core specifications of HBM1 include a maximum stack capacity of 1 GB, achieved through a 4-high configuration of 2 Gbit dies (each contributing 256 MB).[4] Each stack features eight independent 128-bit channels, supporting data transfer rates of up to 1 Gbps per pin. This results in a total bandwidth of approximately 128 GB/s per stack, calculated as 16 GB/s per channel across the eight channels (128 bits × 1 GT/s × 8 channels). The interface employs a wide I/O design with differential clocking to facilitate low-power, high-speed operation, while the 2-channel per die layout optimizes inter-die communication via TSVs. HBM1's integration was first demonstrated in AMD's Fiji GPU architecture, released in 2015, where four 1 GB stacks provided 512 GB/s aggregate bandwidth for high-end graphics workloads.[46][45] At the channel level, HBM1 employs eight pseudo-channels per stack to manage bank access and interleaving, allowing independent addressing within each 128-bit sub-channel for improved parallelism. Error handling is limited to basic on-die detection mechanisms for single-bit faults and post-package repair capabilities, without support for full error-correcting code (ECC) to maintain simplicity and cost efficiency in the initial design. This architecture prioritizes bandwidth density over extensive redundancy, relying on TSVs for vertical integration that reduces signal latency but introduces challenges in thermal management and alignment precision.[45] Despite its innovations, HBM1 faced limitations in density, capping at 1 GB per stack, which constrained scalability for emerging memory-intensive applications relative to subsequent generations. Bandwidth was also modest at 128 GB/s per stack, insufficient for the escalating demands of later high-performance computing scenarios. Manufacturing complexity arose from the novel TSV processes and 3D stacking, leading to initial yield issues due to defects in via alignment and die bonding, which elevated production costs and limited early adoption.[46][47]

HBM2 and HBM2E

High Bandwidth Memory 2 (HBM2) represents the second generation of the HBM standard, standardized by JEDEC in January 2016 under JESD235A.[48] It builds on HBM1 by doubling the per-pin data rate to 2 Gbps while maintaining a 1024-bit wide interface divided into up to 8 independent 128-bit channels per stack.[48] This configuration supports stack heights of 2 to 8 DRAM dies, with die densities from 1 Gb to 8 Gb, enabling capacities up to 8 GB per stack in an 8-high configuration.[48] The resulting peak bandwidth reaches 256 GB/s per stack, calculated as the product of the pin speed, interface width, and channel count divided by 8 to convert bits to bytes.[48] In contrast to HBM1's 1 Gbps per pin and maximum 128 GB/s per stack, HBM2's formula for bandwidth scaling is:
BWHBM2=pin_speed×1024×channels8 \text{BW}_{\text{HBM2}} = \frac{\text{pin\_speed} \times 1024 \times \text{channels}}{8}
where pin_speed is in Gbps and channels range from 2 to 8, yielding up to twice the throughput of its predecessor for equivalent configurations.[48] HBM2 also introduces full error-correcting code (ECC) support per channel for improved data integrity in high-reliability applications.[49] Key enhancements in HBM2 focus on increased pin speeds achieved through advanced signaling techniques, such as pseudo-open drain I/O to reduce power consumption and improve signal integrity at higher rates.[50] It supports flexible channel configurations from 2 to 8, allowing scalability for diverse system needs, and operates at a core voltage of 1.2 V with I/O signaling optimized for efficiency, contributing to overall power gains over HBM1 despite the speed increase.[42] These improvements enable HBM2 to deliver higher performance in bandwidth-intensive workloads while maintaining low latency and energy efficiency.
Samsung 16GB HBM2E Flashbolt memory package
Samsung's 16GB HBM2E Flashbolt, the industry's first 3rd-generation HBM2E memory
HBM2E emerged as an evolutionary extension of HBM2 in 2019, driven by industry demands for greater capacity and speed without a full generational shift.[51] It boosts per-pin data rates to 3.6–6.4 Gbps through refined manufacturing and signaling, supporting up to 12-high stacks with up to 16 Gb dies (2 GB each) for capacities reaching 24 GB per stack.[52] Bandwidth scales accordingly to up to 460 GB/s per stack at 3.6 Gbps, with higher rates possible in optimized implementations.[51] Notable deployments include the NVIDIA A100 GPU, which utilizes HBM2E for 40–80 GB total memory and over 2 TB/s aggregate bandwidth across multiple stacks, and the AMD Instinct MI250 accelerator with 128 GB HBM2E delivering 3.2 TB/s.[53][54] HBM2E retains HBM2's ECC capabilities and channel flexibility, prioritizing seamless integration into existing HBM2 ecosystems for accelerated computing and AI systems.[55]

HBM3 and HBM3E

SK hynix HBM3 24GB memory die and packages
SK hynix 24GB HBM3 memory, showing the DRAM die and packaged stacks
High Bandwidth Memory 3 (HBM3) represents the third generation of the HBM standard, finalized by JEDEC in January 2022 to address escalating demands for bandwidth in high-performance computing and artificial intelligence applications.[56] This iteration doubles the channel count to 16 channels (each 64 bits wide) for a 1024-bit interface per stack while supporting densities up to 24 GB in a 12-high configuration using 16 Gb DRAM layers.[9] The base data rate operates at 6.4 Gbps per pin, delivering a peak bandwidth of up to 819 GB/s per stack, which significantly enhances data throughput for memory-intensive workloads.[57]
Micron HBM3E memory die and package
Micron HBM3E high-bandwidth memory package and die
HBM3E serves as an energy-efficient extension to the HBM3 specification, with initial rollouts occurring in 2023 and broader adoption in 2024, pushing per-pin speeds to 9.2–9.6 Gbps for improved performance without proportionally increasing power consumption.[29] This variant achieves up to 1.2 TB/s bandwidth per stack and supports capacities reaching 36 GB, leveraging higher-density DRAM dies in multi-layer stacks.[58] It has been integrated into advanced accelerators, such as NVIDIA's H200 GPU with 141 GB of HBM3E memory and AMD's Instinct MI325X with 256 GB capacity and 6 TB/s aggregate bandwidth, reflecting 2025 updates in AI hardware ecosystems.[59][60] Key enhancements in HBM3 and HBM3E include adaptive refresh mechanisms, which dynamically adjust refresh intervals to reduce power usage during low-activity periods, and on-die error correction code (ECC) for improved reliability by detecting and correcting single-bit errors directly within the DRAM layers.[61] Additionally, support for multi-stack daisy-chaining allows seamless interconnection of multiple HBM stacks, facilitating scalable configurations in large-scale systems without excessive signaling overhead.[62] In practical operation, the effective throughput of HBM3 and HBM3E accounts for protocol and timing overheads, typically expressed as:
\text{Effective throughput} = \text{base_BW} \times \text{efficiency_factor ($0.9$–$0.95$)}
where base_BW is the theoretical peak bandwidth and the efficiency factor reflects real-world utilization, often around 85–95% in optimized AI training scenarios.[63]

Advanced Variants

High Bandwidth Memory (HBM) has seen innovative extensions through processing-in-memory (PIM) architectures, which integrate compute units directly into the memory stack to minimize data movement between processors and memory. Samsung developed HBM-PIM prototypes in 2023, embedding AI-dedicated processors within the HBM DRAM to offload operations like matrix multiplications, achieving up to 2x speedup in AI inference tasks such as GPT-J models.[64][65] SK Hynix has similarly advanced PIM technologies since 2022, focusing on domain-specific memory for AI clusters.[66] These variants reduce energy consumption by performing computations locally in memory; conceptually, the energy savings can be modeled as $ E_{\text{PIM}} = E_{\text{standard}} \times (1 - \text{compute locality}) $, where compute locality represents the fraction of operations executed in-memory, leading to reported reductions of up to 85% in data movement energy for transformer-based AI workloads. The next major advancement, HBM4, was standardized by JEDEC in April 2025 under JESD270-4, with development completed by major vendors such as SK Hynix in September 2025 and samples supplied to customers like NVIDIA; mass production began in 2026.[67][68][69] In Q3 2025, SK hynix held 53% of the HBM market, followed by Samsung with 35% and Micron with 11%. In late 2025, Samsung overtook Micron for second place in the HBM market. In February 2026, Samsung began mass production and customer shipments of HBM4, achieving up to 3.3 TB/s per stack with pin speeds of 11.7 Gbps (up to 13 Gbps).[5] By February 2026, the three major vendors displayed differing production progress. Micron reported HBM4 yields on track, enabling high-volume production and customer shipments a quarter ahead of schedule, with pin speeds exceeding 11 Gbps and its entire 2026 supply sold out.[7][70] These achievements reflect strong yield stability and a rapid production ramp-up. In comparison, Samsung's mass production has been constrained by 1c DRAM yields of approximately 60%, resulting in lower effective yields after back-end processing.[8] SK Hynix secured approximately 60% of NVIDIA's HBM4 allocation but has faced difficulties achieving consistent 11 Gbps-class performance in early reliability evaluations.[8] SK hynix is the primary supplier of HBM for NVIDIA's high-end AI GPUs, expected to hold approximately 60% of Nvidia's HBM4 supply in 2026, with Samsung projected to capture around 30%.[71] Analysts project Samsung to exceed 30% overall HBM share in 2026 with HBM4 ramp-up, while Micron is expected to maintain a low-20% share due to sold-out capacity and high demand.[72] It supports stack configurations up to 16-high using 24 Gb or 32 Gb DRAM dies for capacities reaching 64 GB per stack.[67][69] It delivers over 2 TB/s bandwidth per stack via a 2048-bit interface at 8 Gbps per pin, with vendors like SK Hynix targeting over 10 Gbps for enhanced AI and high-performance computing applications; vendor implementations such as Samsung's exceed the base specification, achieving higher pin rates and bandwidths.[73][74] HBM4 incorporates hybrid bonding for finer interconnect pitches, enabling tighter integration with compute dies and reduced latency compared to prior generations.[75] HBM4E represents an enhanced variant of HBM4, developed by manufacturers including Samsung, Micron, and SK Hynix to meet the escalating demands of AI and high-performance computing systems.[11][76][77] It achieves per-pin data rates up to 13 Gbps, delivering bandwidths of up to 3.25 TB/s per stack, which is approximately 2.5 times higher than HBM3E. Notably, some vendor implementations of HBM4, such as Samsung's, have achieved similar high-end specifications of up to 3.3 TB/s per stack.[11][78] These advancements leverage hybrid bonding and other refined interconnect technologies for improved efficiency and integration.[79] Mass production of HBM4E is anticipated starting in 2027, with samples already being supplied to partners like NVIDIA for next-generation accelerators.[78][77] Emerging variants extend HBM's utility in disaggregated systems through integration with Compute Express Link (CXL), allowing pooled HBM resources across servers for flexible memory allocation in AI clusters, as demonstrated in Samsung's 2023 prototypes combining HBM-PIM with CXL for up to 1.1 TB/s bandwidth and 512 GB capacity.[65] Additionally, evolutions in 2.5D packaging, including advanced silicon interposers and hybrid bonding, support higher-density HBM stacks with improved thermal management and signal integrity for next-generation AI accelerators.[80][81] Development of subsequent generations is underway, with HBM5 currently in development under JEDEC's JC-42.2 High Bandwidth Memory Subcommittee. Industry reports also indicate that work on HBM6 has begun, with potential targeting of platforms post-2029.[13][14] According to a March 2026 industry report, JEDEC is reportedly considering relaxing the HBM thickness standard from 775 μm (as in HBM4) to 825–900 μm to support mass production of 20-layer stacked HBM4E and HBM5. This proposed change is influenced by TSMC's SoIC 3D packaging technology and practical manufacturing constraints that limit extreme die thinning for higher stacks. If implemented, it could accelerate production timelines to as early as late 2027 or early 2028, reduce costs through higher yields and more mature bonding processes, and delay the widespread adoption of hybrid bonding.[82] In February 2026, Hanmi Semiconductor announced at Semicon Korea 2026 the readiness of its wide TC bonders for the mass production of HBM5 and HBM6. These bonders are presented as an alternative to hybrid bonding amid reported technical difficulties with the latter, aiming to increase production yields, enhance quality through advanced precision and fluxless bonding technologies, and enable thinner HBM stacks.[14][83]

Recent Developments

In 2026, Micron reported its 2026 HBM production fully committed under long-term binding contracts, with high-volume shipments of HBM4 beginning ahead of schedule and yields on track. This contributed to record financial performance in fiscal Q2 2026, including elevated gross margins near 70%+ driven by premium HBM pricing and tight supply. Industry-wide, HBM shortages persisted, with AI data centers consuming a significant portion of output, and new capacity ramps not expected to materially alleviate constraints until 2027 or later due to technical challenges in stacking and yields.

Historical Development

Origins and Background

The development of High Bandwidth Memory (HBM) originated in the 2000s from research on three-dimensional integrated circuits (3D ICs), spearheaded by initiatives from the Defense Advanced Research Projects Agency (DARPA) and academic institutions, aimed at overcoming the "memory wall" in von Neumann architectures. This memory wall, first articulated by Wulf and McKee, describes the widening gap where processor computational speeds have outpaced memory access latencies and bandwidth improvements by factors of 50 to 100, creating a bottleneck in data-intensive applications.[84][85] 3D IC research focused on vertically stacking components to shorten interconnects, reduce latency, and enhance bandwidth density, with early explorations dating back to DARPA-funded programs on heterogeneous integration in the early 2000s. Key early concepts for HBM's stacked architecture emerged from academic and industry papers in the mid-2000s, including IEEE publications proposing vertical interconnections for chip stacks to enable wider data paths and higher throughput in memory systems. For instance, a 2004 IEEE paper detailed process integration techniques for 3D chip stacks using through-silicon vias (TSVs) to facilitate dense vertical signaling, laying foundational ideas for memory-logic integration. Initial prototypes of stacked DRAM with wide interfaces, such as Samsung's Wide-I/O mobile DRAM, were demonstrated around 2011, building on these concepts to achieve preliminary high-bandwidth performance in lab settings.[86][87][2] Driving this evolution were the escalating memory demands of GPU advancements post-2010, as NVIDIA and AMD pushed architectures like Fermi and subsequent generations that amplified parallel compute but strained traditional GDDR memory's bandwidth limits in high-end graphics and emerging compute workloads. Power efficiency constraints in data centers further necessitated innovations like 3D stacking, as conventional memory interfaces consumed excessive energy for scaling bandwidth beyond 10 GB/s per channel. Precursor standards, such as the Wide I/O interface developed under JEDEC with input from the MIPI Alliance, provided early frameworks for low-power, wide-channel 3D memory suitable for mobile and high-performance applications.[88][89][90] In response to GDDR's limitations in power and scalability for ultra-high-end graphics, AMD collaborated closely with SK Hynix starting in 2013 to pioneer HBM as a next-generation solution, emphasizing 3D stacking to deliver terabit-per-second bandwidth while maintaining compact form factors. SK Hynix has strategically focused on leadership in AI memory technology through its development of advanced HBM variants, essential for AI GPU accelerators.[91] This industry partnership addressed the need for memory that could keep pace with GPU compute scaling without exacerbating data center energy demands. Samsung later contributed to HBM evolution through JEDEC standardization and HBM2 production.[92][93][94]

Standardization and Milestones

The standardization of High Bandwidth Memory (HBM) was spearheaded by the Joint Electron Device Engineering Council (JEDEC), which published the initial JESD235 specification in October 2013 to define the architecture and interface for HBM1.[95] Key semiconductor manufacturers, including Samsung, SK Hynix, and Micron, contributed significantly to the development of this standard through their participation in JEDEC committees, ensuring compatibility across industry ecosystems.[96][67] In January 2016, JEDEC released the updated JESD235A specification for HBM2, which enhanced data rates and capacity while maintaining backward compatibility with the original framework.[48] The JESD238 standard for HBM3 followed in January 2022, introducing higher pin speeds up to 6.4 Gbps and support for up to 16 channels to meet escalating bandwidth demands in high-performance computing.[97][57] A major milestone in HBM's adoption occurred in June 2015 with the launch of the AMD Radeon R9 Fury X graphics card, the first commercial product to integrate HBM1, delivering 512 GB/s of bandwidth in a 4 GB stack.[98] NVIDIA advanced this trajectory in 2017 by incorporating HBM2 into its Tesla V100 accelerator based on the Volta architecture, enabling 900 GB/s bandwidth for data center applications.[99] In 2019, vendors like Samsung and SK Hynix introduced HBM2E as a non-JEDEC extension, boosting per-pin speeds to 3.6 Gbps and capacities up to 24 GB per stack to bridge gaps until full HBM3 ratification.[55] HBM3E sampling began in 2023, with SK Hynix unveiling 8 Gbps/pin modules in May and Micron following with 24 GB 8-high stacks for NVIDIA's H200 GPUs.[100][58] The AI boom from 2023 to 2025 propelled HBM's market growth, with the total addressable market expanding from approximately $4 billion in 2023 to an estimated $35 billion in 2025, according to Micron's forecasts.[101] This surge led to supply shortages in 2024 and 2025, as demand outpaced production; for instance, SK Hynix reported its HBM supply nearly sold out for 2025 due to NVIDIA's procurement needs.[102] By 2025, HBM integration reached over 70% of top AI GPUs, driven by partnerships such as TSMC's CoWoS advanced packaging technology, which facilitates efficient stacking of HBM with GPUs from NVIDIA and AMD. In September 2025, SK Hynix completed development of the world's first HBM4, preparing for mass production to support next-generation AI systems.[103][104][105]

Applications

Graphics and Gaming

High Bandwidth Memory (HBM) has seen early adoption in graphics processing units (GPUs) primarily for high-end gaming and professional visualization applications, where its stacked architecture provides superior bandwidth compared to traditional GDDR memory. AMD integrated HBM2 with its Radeon RX Vega series in 2017 to deliver up to 483 GB/s of memory bandwidth, which supported enhanced performance in demanding rendering tasks.[106][107] This was followed by the Radeon VII in 2019, featuring 16 GB of HBM2 across a 4096-bit interface for 1 TB/s bandwidth, enabling smooth 4K and 8K video playback and gaming at high frame rates in titles requiring intensive graphical computations.[108] In gaming scenarios, HBM's sustained high bandwidth excels at rapid texture loading and processing complex shaders, minimizing latency in real-time rendering pipelines. This is particularly beneficial for ray tracing workloads, where HBM facilitates quicker access to large datasets for light simulation and reflection calculations, resulting in more realistic visuals without frame drops. For virtual reality (VR) and augmented reality (AR) applications, HBM reduces memory bottlenecks during high-fidelity environment rendering, supporting immersive experiences with minimal stuttering in dynamic scenes.[109][110] NVIDIA has also leveraged HBM in professional graphics cards, such as the Quadro GP100 released in 2017, which utilized 16 GB of HBM2 for bandwidth-intensive tasks like 3D modeling and simulation in gaming development workflows.[111] Although consumer gaming GPUs have largely stuck to GDDR variants due to cost, HBM's power efficiency—achieving high throughput at lower voltages—has influenced designs akin to gaming consoles. Despite these advantages, HBM's higher manufacturing costs restrict its use to premium GPUs, primarily in flagship models for enthusiasts and professionals. This premium positioning ensures HBM targets scenarios where bandwidth demands outweigh affordability concerns, such as ultra-high-resolution gaming and content creation.

AI and High-Performance Computing

High Bandwidth Memory (HBM) provides high bandwidth data transmission for AI accelerators such as GPUs and TPUs, playing a pivotal role in artificial intelligence (AI) accelerators, where its high bandwidth and capacity enable efficient handling of large-scale data for training and inference workloads. SK Hynix maintains leadership in HBM technology for AI applications, having dominated production of advanced versions such as HBM3E and contributing to the development of HBM4. As of February 2026, SK Hynix and Samsung are positioned to supply HBM4 for Nvidia's AI accelerators, including those based on the Vera Rubin architecture, with Samsung achieving the first shipment of HBM4 to Nvidia, while Micron has been excluded from HBM4 supply plans due to failure to meet performance requirements.[17][112][113][114][115][71] In NVIDIA's Hopper architecture GPUs, such as the H100 introduced in 2023 and the H200 in 2024, HBM3 and HBM3e provide up to 141 GB of memory per GPU, supporting the processing of massive large language models (LLMs) like those exceeding 100 billion parameters without extensive model sharding.[116][117] This configuration delivers up to 4.8 TB/s of bandwidth, facilitating faster matrix multiplications critical for transformer-based architectures in LLM training.[118] Compared to prior generations using HBM2e, such as the A100, the H100 and H200 achieve 3x to 4x improvements in training throughput for LLMs due to enhanced memory access speeds and tensor core optimizations.[119] In high-performance computing (HPC), HBM integration in GPU-accelerated nodes supports exascale simulations requiring rapid data throughput for complex scientific computations. The Frontier supercomputer, deployed in 2022 at Oak Ridge National Laboratory, leverages AMD EPYC processors paired with Instinct MI250X GPUs equipped with 128 GB of HBM2e per accelerator, enabling peak performance of over 1.1 exaFLOPS for double-precision workloads.[120] This setup has powered advanced climate modeling, including the SCREAM (Spectrally coupled Community Atmosphere Model with Emphasized Array Methods) simulation, which resolved global cloud processes at kilometer-scale resolution in under a day—advancing predictions of extreme weather patterns and their U.S. impacts.[121] By 2025, HBM adoption extends to tensor processing units (TPUs) and custom application-specific integrated circuits (ASICs), addressing the demands of distributed AI paradigms like federated learning. Google's Trillium (TPU v6e), previewed in 2024 and scaling into production, doubles HBM capacity to 32 GB per chip with 1.64 TB/s bandwidth, enhancing efficiency for privacy-preserving federated training across edge devices and data centers.[122] Custom ASICs from vendors like Broadcom, integrated with HBM3e stacks, enable multi-terabyte memory pools in hyperscale clusters, reducing latency in collaborative model updates for federated scenarios.[123][124] HBM's proximity to compute logic minimizes data movement overhead in AI pipelines, lowering energy costs for memory-bound operations and enabling sustainable scaling to exaFLOPS-level performance (10^15 FLOPS).[32] The expansion of AI data centers has caused explosive demand for HBM and other high-value memory products, driven by AI advancements, straining production resources as manufacturing 1 GB of HBM requires approximately three times the silicon wafer capacity compared to standard DRAM, contributing to shortages and price increases in conventional RAM supplies. In 2026, AI data centers are projected to consume approximately 70% of global HBM production, further exacerbating shortages and providing suppliers with significant pricing power.[125][126][127] In HPC and AI systems, this architecture supports the bandwidth needs of trillion-parameter models, ensuring efficient resource utilization as compute clusters expand toward zettascale ambitions.[24]

Comparisons and Future Outlook

Versus Other Memory Technologies

High Bandwidth Memory (HBM) offers substantial advantages in bandwidth over GDDR6 and GDDR6X, primarily due to its wide interface and stacked architecture, enabling a single HBM3E stack to achieve up to 1.2 TB/s, compared to approximately 1 TB/s total bandwidth in high-end GDDR6X implementations like NVIDIA's RTX 4090 GPU. This results in 3-5x higher effective bandwidth for bandwidth-intensive workloads, though GDDR6X remains preferable for cost-sensitive gaming applications where its lower price point—about 3-5x less per GB than HBM—offsets slightly reduced peak throughput. HBM also incurs 2-3x higher latency in low-load scenarios due to its lower per-pin clock speeds, but its proximity to the processor via 2.5D integration mitigates this under sustained high utilization. In contrast to DDR5 and LPDDR5, HBM's vertical stacking yields roughly 10x greater bandwidth density, packing terabytes per second into a compact footprint that suits space-constrained high-performance systems, though it requires approximately three times more silicon wafer area per gigabyte than DDR5 due to stacking complexities and larger dies, while a typical DDR5 DIMM delivers only about 76.8 GB/s at 9.6 GT/s. DDR5 and LPDDR5, however, provide superior capacity scalability, with modules reaching up to 128 GB, and benefit from widespread adoption in consumer and server platforms for their lower cost and simpler integration. HBM's premium pricing, often 5x higher per GB, limits its use to specialized domains where bandwidth trumps volume.[128] In the context of AI-driven demand, NAND flash focuses on backend storage for AI servers (e.g., high-capacity SSDs) and enterprise/consumer applications with moderate demand growth and mid-single-digit to low double-digit price rises, whereas HBM targets explosive AI training and inference needs with significantly stronger price surges and margin expansion.[129]
MetricHBM3E (per stack)GDDR6X (high-end GPU total)DDR5 (per module)
Bandwidth1.2 TB/s1 TB/s76.8 GB/s
Power Consumption~30 W~35-50 W (total for 24 chips)~10 W
Cost ($/GB)$10-20$5-15$5-10
Modern GPU architectures frequently employ hybrid memory configurations, utilizing HBM as a high-speed L2 cache for compute-critical tasks while relying on GDDR as the primary main memory for larger, less bandwidth-demanding storage needs, balancing performance and economics in designs from NVIDIA and AMD. In these hierarchies, on-chip SRAM serves as low-level caches (L1-L3) for ultra-low latency access, but its high cost—upwards of $5,000 per GB due to dense transistor requirements—makes it impractical for large capacities. HBM, costing 3x or more per GB than standard DDR5 but far less than SRAM, provides a cost-effective high-bandwidth solution for tens of GB in AI and GPU applications, enabling balanced speed, capacity, and economics.[130][131] === Market Reporting Metrics === In industry reports (e.g., from TrendForce), HBM market size is often described using two related but distinct measures:
  • '''Total shipments''' — Measured in '''million Gb''' (gigabits), this represents the raw bit volume produced and shipped by suppliers (SK hynix, Samsung, Micron). It reflects fab output, wafer starts, TSV processing, and stacking yields. For example, 2025 HBM shipments were reported as approximately 23,700 million Gb (23.7 billion gigabits).
  • '''Total capacity shipped''' — Converted to '''GB''' (gigabytes), '''PB''' (petabytes), or '''EB''' (exabytes), this indicates the usable memory capacity delivered to customers. Since 1 byte = 8 bits, capacity in GB ≈ shipments in Gb ÷ 8. Using the 2025 figure: 23,700 million Gb ÷ 8 ≈ 2.9625 billion GB (roughly 2.96 EB or 2,960 PB).
This distinction arises because memory production focuses on bit density and wafer efficiency, while end-users and system designers care about byte-addressable capacity for populating GPUs/AI accelerators. The bit-based metric better captures manufacturing scale, whereas byte-based figures align with per-GPU HBM specs (e.g., 192 GB on Blackwell B200). Minor variations may occur due to yields, binary/decimal prefixes, or report-specific adjustments. The High Bandwidth Memory (HBM) market is poised for substantial expansion, with projections estimating a value of tens to over $100 billion by 2030, fueled predominantly by artificial intelligence workloads that are expected to drive over 55% of demand through high-bandwidth requirements exceeding 500 GB/s. AI data center expansion is causing explosive demand for high-bandwidth memory (HBM) and other high-value memory products, contributing to this growth and exacerbating supply chain constraints.[132][133] This growth reflects a compound annual rate of approximately 30% for AI-focused HBM through the decade, as major hyperscalers and chipmakers prioritize memory solutions for training large language models and inference tasks.[134] HBM4 advancements are central to this trajectory, enabling 2.5D and 3D system-in-package integrations that support denser, more efficient multi-die architectures for next-generation accelerators.[29] Market forecasts indicate that in 2026, HBM3E is expected to dominate the HBM market, accounting for approximately two-thirds of total HBM shipments, while HBM4 ramps up gradually with mass production starting in early 2026 (e.g., February for some suppliers) and much of 2026 capacity already sold out. HBM4 is projected to contribute more meaningfully in 2027, with further growth in variants like HBM4E potentially in 2027-2028. The overall HBM market continues to experience strong growth driven by AI demand, with major suppliers SK hynix (leading), Samsung, and Micron anticipating significant revenue increases; for example, Micron forecasts 164% HBM revenue growth in 2026 and 40% in 2027.[135][136][137] Key challenges in HBM development include supply chain constraints, where through-silicon via (TSV) yields for high-stack HBM4 prototypes have improved to nearly 80% as of late 2025 (from around 65% in mid-2025), though scalable production remains limited.[138][139] Thermal throttling in dense stacks exacerbates these issues, as vertical integration increases power density and heat dissipation demands, necessitating advanced cooling like liquid systems to maintain performance without speed reductions.[140] Standardization efforts for HBM4, finalized by JEDEC in April 2025, have seen vendor-specific delays due to yield and validation hurdles, pushing mass production timelines into 2026 for leading vendors, with some like Micron delayed to 2027.[141] In March 2026, industry reports indicated that JEDEC is planning to relax the HBM thickness standard from 775 μm to a range of 825–900 μm to enable mass production of 20-layer stacked HBM4E and HBM5. This proposed change is influenced by practical manufacturing limits and TSMC's SoIC 3D packaging technology, which reallocates thickness budgets in stacked dies. The adjustment could address challenges with extreme die thinning and TSV yields, accelerate development timelines, reduce costs through the use of mature thermal compression bonding, and potentially delay the adoption of hybrid bonding techniques.[82] Future directions for HBM emphasize hybrid integrations to overcome bandwidth walls, including emerging optical interconnects that could enhance AI system scalability by reducing latency in memory access, with prototypes demonstrating feasibility for data center deployment in the late 2020s.[142] Processing-in-memory (PIM) capabilities are gaining traction in HBM designs for AI chips, projected to grow at a 35% CAGR through 2033 by embedding compute logic directly in memory to mitigate von Neumann bottlenecks.[143] Samsung Electronics is pursuing the development of glass substrate technology for next-generation HBM through in-house efforts led by Samsung Electro-Mechanics, combined with strategic partnerships and investments. This includes a memorandum of understanding signed on November 5, 2025, to establish a joint venture with Sumitomo Chemical for the production of glass core materials used in advanced package substrates, aimed at improving thermal and mechanical properties for high-performance computing applications.[144] Samsung has also partnered with Jungwoo M-Tech (JWMT) for glass substrate processing equipment, acquiring a stake through its venture investment arm to support pilot line development at its Sejong plant, with mass production targeted for late 2026.[145] Additionally, investments in Extol for metal plating technology and ongoing discussions with Chemtronics for glass substrate supply are part of this approach to enhance packaging efficiency for HBM4 and beyond.[146][147] While Hybrid Memory Cube (HMC) offers an alternative for niche high-performance computing, HBM's broader ecosystem adoption positions it as the dominant technology, with HMC maintaining only a supplementary role in specialized networking applications.[148] The rising demand for HBM, driven by AI applications, has significantly benefited semiconductor packaging companies as a key midstream link in the supply chain. This demand has led to surging orders for packaging services related to DRAM, NAND Flash, and HBM, with capacity utilization rates exceeding 90% in 2025 and continuing into 2026, resulting in full production lines and rapid expansion efforts.[149] AI-driven needs have spurred growth in advanced packaging technologies, such as XDFOI (X Dimension Fan-Out Integration), which enables high-density integration for AI and high-performance computing applications including GPUs.[150] As midstream providers, packaging companies enjoy high demand elasticity and strong pricing power, allowing them to transmit upstream price increases—such as the 50-55% rise in DRAM and HBM prices in early 2026—downstream, thereby improving gross margins amid ongoing shortages.[17][26][151] Market shares have fluctuated with AI demand. As of Q3 2025, SK Hynix held 53%, Samsung 35%, and Micron 11%. Recent data (late 2025) shows SK Hynix maintaining dominance at 50-70% (e.g., 57-62% in some quarters), with Samsung and Micron trailing. In the broader DRAM market (including HBM contributions), Samsung led Q4 2025 sales at ~36%, followed by SK Hynix ~32% and Micron ~22%. Emerging KV cache compression techniques like Google Research's TurboQuant (announced March 2026) may temper HBM demand growth by reducing inference memory needs, though analysts view impacts as limited and potentially offset by increased AI adoption. In February 2026, reports continued to emphasize strong AI-driven demand for HBM, with AI data centers projected to consume around 70% of global HBM production that year, intensifying shortages and supporting pricing power for manufacturers including Samsung, SK Hynix, and Micron.[127] Additionally, reports indicated that Micron was excluded from Nvidia's HBM4 supplier list, potentially due to challenges meeting specifications such as 11 Gbps per pin, leaving Samsung and SK Hynix as primary suppliers for Nvidia's next-generation platforms.[152] Nvidia and AMD maintained leading positions in AI GPUs, with positive long-term outlooks tied to sustained AI infrastructure investments. Following Samsung's announcement of commencing mass production and customer shipments of HBM4 memory chips, on February 13, 2026, stocks of major memory chip producers displayed mixed performance: Micron Technology (MU) declined 1.79% (-$7.39) to $406.58 (intraday during market open at 10:49 AM EST; previous close $413.97); Samsung Electronics (005930.KS) rose 1.46% (+2,600 KRW) to 181,200 KRW (closed at 3:30 PM KST); SK Hynix (000660.KS) fell 0.90% (-8,000 KRW) to 880,000 KRW (closed at 3:30 PM KST). These movements illustrate the intense competition, supply chain pressures, and investor sensitivity to advancements in HBM technology and relative positioning among key suppliers.[153][21][154] This concentrated market remains vulnerable to disruptions. AI demand surges in 2024 and 2025 have triggered severe pricing volatility and shortages, with HBM and DRAM prices rising over 100% year-over-year as of late 2025 amid sold-out allocations through 2026. Major DRAM vendors prioritize HBM production over standard DRAM because it yields higher profit margins (approximately 60% for HBM vs. 40% for standard DRAM) and meets surging demand from AI chipmakers like NVIDIA, consuming advanced fabrication capacity that could otherwise support standard DDR4/DDR5 modules.[155][17] This prioritization is driven by the profitability of HBM for AI and GPU applications, leading companies like Samsung, SK Hynix, and Micron to shift production capacities from standard DRAM to HBM, often on a three-to-one basis where producing one bit of HBM forgoes three bits of conventional memory, thereby limiting output for consumer and standard uses and causing significant price increases in those markets. For instance, Micron has discontinued portions of its consumer PC memory business to redirect supply toward AI chips and servers, resulting in HBM production being sold out for all of 2026 and DRAM prices surging 50-55% in early 2026 compared to the previous quarter. This shift has boosted profitability, with Micron's net income nearly tripling in its most recent quarter and its stock rising 247% over the past year.[17] However, despite strong bookings for HBM, as competition in the HBM market intensifies and DDR5 experiences extreme price surges driven by shortages inflating prices and margins, manufacturers may reallocate production capacity from HBM to DDR5 modules to capitalize on superior profitability per wafer or production line, even amid competitive pressures on HBM returns; for example, Samsung has shifted focus from HBM to DDR5 production.[156][157][158] This situation is exacerbated by the resource-intensive nature of HBM production, where 1 GB of HBM requires up to four times the silicon wafer capacity compared to standard DRAM, thereby constraining overall DRAM production and contributing to shortages in conventional RAM for PCs, servers, and other applications.[159][160][16]

References

User Avatar
No comments yet.