Recent from talks
Contribute something
Nothing was collected or created yet.
High Bandwidth Memory
View on Wikipedia| Computer memory and data storage types |
|---|
| Volatile |
| Non-volatile |
High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs[1] and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers (such as the NEC SX-Aurora TSUBASA and Fujitsu A64FX).[2] The first HBM memory chip was produced by SK Hynix in 2013,[3] and the first devices to use HBM were the AMD Fiji GPUs in 2015.[4][5]
HBM was adopted by JEDEC as an industry standard in October 2013.[6] The second generation, HBM2, was accepted by JEDEC in January 2016.[7] JEDEC officially announced the HBM3 standard on January 27, 2022,[8] and the HBM4 standard in April 2025.[9][10]
Technology
[edit]| Type | Release | max data rate speed per pin |
Stack | per Stack | |
|---|---|---|---|---|---|
| max capacity | max data rate | ||||
| HBM 1 | Oct 2013 | 1.0 Gb/s | 8×128 bit | 4 dies × 1 GB = 4 GB | 128 GB/s |
| HBM 2 | Jan 2016 | 2.4 Gb/s | 8 dies × 1 GB = 8 GB | 307 GB/s | |
| HBM 2E | Aug 2019 | 3.6 Gb/s | 12 dies × 2 GB = 24 GB | 461 GB/s | |
| HBM 3 | Jan 2022 | 6.4 Gb/s | 16×64 bit | 12 dies × 2 GB = 24 GB | 819 GB/s |
| HBM 3E | May 2023 | 9.8 Gb/s | 16 dies × 3 GB = 48 GB | 1229 GB/s | |
| HBM 4 | April 2025 | 8 Gb/s | 32×64 bit | 16 dies × 4 GB = 64 GB | 2048 GB/s |
HBM achieves higher bandwidth than DDR4 or GDDR5 while using less power, and in a substantially smaller form factor.[11] This is achieved by stacking up to eight DRAM dies and an optional base die which can include buffer circuitry and test logic.[12] The stack is often connected to the memory controller on a GPU or CPU through a substrate, such as a silicon interposer.[13][14] Alternatively, the memory die could be stacked directly on the CPU or GPU chip. Within the stack the dies are vertically interconnected by through-silicon vias (TSVs) and microbumps. The HBM technology is similar in principle but incompatible with the Hybrid Memory Cube (HMC) interface developed by Micron Technology.[15]
HBM memory bus is very wide in comparison to other DRAM memories such as DDR4 or GDDR5. An HBM stack of four DRAM dies (4‑Hi) has two 128‑bit channels per die for a total of 8 channels and a width of 1024 bits in total. A graphics card/GPU with four 4‑Hi HBM stacks would therefore have a memory bus with a width of 4096 bits. In comparison, the bus width of GDDR memories is 32 bits, with 16 channels for a graphics card with a 512‑bit memory interface.[16] HBM supports up to 4 GB per package.
The larger number of connections to the memory, relative to DDR4 or GDDR5, required a new method of connecting the HBM memory to the GPU (or other processor).[17] AMD and Nvidia have both used purpose-built silicon chips, called interposers, to connect the memory and GPU. This interposer has the added advantage of requiring the memory and processor to be physically close, decreasing memory paths. However, as semiconductor device fabrication is significantly more expensive than printed circuit board manufacture, this adds cost to the final product.
-
HBM DRAM die
-
HBM controller die
-
HBM memory on an AMD Radeon R9 Nano graphics card's GPU package
Interface
[edit]
The HBM DRAM is tightly coupled to the host compute die with a distributed interface. The interface is divided into independent channels. The channels are completely independent of one another and are not necessarily synchronous to each other. The HBM DRAM uses a wide-interface architecture to achieve high-speed, low-power operation. The HBM DRAM uses a 500 MHz differential clock CK_t / CK_c (where the suffix "_t" denotes the "true", or "positive", component of the differential pair, and "_c" stands for the "complementary" one). Commands are registered at the rising edge of CK_t, CK_c. Each channel interface maintains a 128‑bit data bus operating at double data rate (DDR). HBM supports transfer rates of 1 GT/s per pin (transferring 1 bit), yielding an overall package bandwidth of 128 GB/s.[18]
HBM2
[edit]The second generation of High Bandwidth Memory, HBM2, also specifies up to eight dies per stack and doubles pin transfer rates up to 2 GT/s. Retaining 1024‑bit wide access, HBM2 is able to reach 256 GB/s memory bandwidth per package. The HBM2 spec allows up to 8 GB per package. HBM2 is predicted to be especially useful for performance-sensitive consumer applications such as virtual reality.[19]
On January 19, 2016, Samsung announced early mass production of HBM2, at up to 8 GB per stack.[20][21] SK Hynix also announced availability of 4 GB stacks in August 2016.[22]
-
HBM2 DRAM die
-
HBM2 controller die
-
The HBM2 interposer of a Radeon RX Vega 64 GPU, with removed HBM dies; the GPU is still in place
HBM2E
[edit]In late 2018, JEDEC announced an update to the HBM2 specification, providing for increased bandwidth and capacities.[23] Up to 307 GB/s per stack (2.5 Tbit/s effective data rate) is now supported in the official specification, though products operating at this speed had already been available. Additionally, the update added support for 12‑Hi stacks (12 dies) making capacities of up to 24 GB per stack possible.
On March 20, 2019, Samsung announced their Flashbolt HBM2E, featuring eight dies per stack, a transfer rate of 3.2 GT/s, providing a total of 16 GB and 410 GB/s per stack.[24]
August 12, 2019, SK Hynix announced their HBM2E, featuring eight dies per stack, a transfer rate of 3.6 GT/s, providing a total of 16 GB and 460 GB/s per stack.[25][26] On July 2, 2020, SK Hynix announced that mass production has begun.[27]
In October 2019, Samsung announced their 12-layered HBM2E.[28]
HBM3
[edit]In late 2020, Micron unveiled that the HBM2E standard would be updated and alongside that they unveiled the next standard known as HBMnext (later renamed to HBM3). This was to be a big generational leap from HBM2 and the replacement to HBM2E. This new VRAM would have come to the market in the Q4 of 2022. This would likely introduce a new architecture as the naming suggests.
While the architecture might be overhauled, leaks pointed to performance similar to the updated HBM2E standard. This RAM was likely to be used mostly in data center GPUs.[29][30][31][32]
In mid 2021, SK Hynix unveiled some specifications of the HBM3 standard, with 5.2 Gbit/s I/O speeds and bandwidth of 665 GB/s per package, as well as up to 16-high 2.5D and 3D solutions.[33][34]
On 20 October 2021, before the JEDEC standard for HBM3 was finalised, SK Hynix was the first memory vendor to announce that it has finished development of HBM3 memory devices. According to SK Hynix, the memory would run as fast as 6.4 Gbit/s/pin, double the data rate of JEDEC-standard HBM2E, which formally tops out at 3.2 Gbit/s/pin, or 78% faster than SK Hynix's own 3.6 Gbit/s/pin HBM2E. The devices support a data transfer rate of 6.4 Gbit/s and therefore a single HBM3 stack may provide a bandwidth of up to 819 GB/s. The basic bus widths for HBM3 remain unchanged, with a single stack of memory being 1024-bits wide. SK Hynix would offer their memory in two capacities: 16 GB and 24 GB, aligning with 8-Hi and 12-Hi stacks respectively. The stacks consist of 8 or 12 16 Gb DRAMs that are each 30 μm thick and interconnected using Through Silicon Vias (TSVs).[35][36][37]
According to Ryan Smith of AnandTech, the SK Hynix first generation HBM3 memory has the same density as their latest-generation HBM2E memory, meaning that device vendors looking to increase their total memory capacities for their next-generation parts would need to use memory with 12 dies/layers, up from the 8 layer stacks they typically used until then.[35] According to Anton Shilov of Tom's Hardware, high-performance compute GPUs or FPGAs typically use four or six HBM stacks, so with SK Hynix's HBM3 24 GB stacks they would accordingly get 3.2 TB/s or 4.9 TB/s of memory bandwidth. He also noted that SK Hynix's HBM3 chips are square, not rectangular like HBM2 and HBM2E chips.[36] According to Chris Mellor of The Register, with JEDEC not yet having developed its HBM3 standard, might mean that SK Hynix would need to retrofit its design to a future and faster one.[37]
JEDEC officially announced the HBM3 standard on January 27, 2022.[8] The number of memory channels was doubled from 8 channels of 128 bits with HBM2e to 16 channels of 64 bits with HBM3. Therefore, the total number of data pins of the interface is still 1024.[38]
In June 2022, SK Hynix announced they started mass production of industry's first HBM3 memory to be used with Nvidia's H100 GPU expected to ship in Q3 2022. The memory will provide H100 with "up to 819 GB/s" of memory bandwidth.[39]
In August 2022, Nvidia announced that its "Hopper" H100 GPU will ship with five active HBM3 sites (out of six on board) offering 80 GB of RAM and 3 TB/s of memory bandwidth (16 GB and 600 GB/s per site).[40]
HBM3E
[edit]On 30 May 2023, SK Hynix unveiled its HBM3E memory with 8 Gbit/s/pin data processing speed (25% faster than HBM3), which is to enter production in the first half of 2024.[41] At 8 GT/s with 1024-bit bus, its bandwidth per stack is increased from 819.2 GB/s as in HBM3 to 1 TB/s.
On 26 July 2023, Micron announced its HBM3E memory with 9.6 Gbit/s/pin data processing speed (50% faster than HBM3).[42] Micron HBM3E memory is a high-performance HBM that uses 1β DRAM process technology and advanced packaging to achieve the highest performance, capacity and power efficiency in the industry. It can store 24 GB per 8-high cube and allows data transfer at 1.2 TB/s. There will be a 12-high cube with 36 GB capacity in 2024.
In August 2023, Nvidia announced a new version of their GH200 Grace Hopper superchip that utilizes 141 GB (144 GiB physical) of HBM3e over a 6144-bit bus providing 50% higher memory bandwidth and 75% higher memory capacity over the HBM3 version.[43]
In May 2023, Samsung announced HBM3P with up to 7.2 Gbit/s which will be in production in 2024.[44]
On October 20, 2023, Samsung announced their HBM3E "Shinebolt" with up to 9.8 Gbit/s memory.[45]
On February 26, 2024, Micron announced the mass production of Micron's HBM3E memory.[46]
On March 18, 2024, Nvidia announced the Blackwell series of GPUs using HBM3E memory[47]
On March 19, 2024, SK Hynix announced the mass production of SK Hynix's HBM3E memory.[48]
In September 2024, SK Hynix announced the mass production of its 12-layered HBM3E memory[49] and in November the 16-layered version.[50]
HBM-PIM
[edit]In February 2021, Samsung announced the development of HBM with processing-in-memory (PIM). This new memory brings AI computing capabilities inside the memory, to increase the large-scale processing of data. A DRAM-optimised AI engine is placed inside each memory bank to enable parallel processing and minimise data movement. Samsung claims this will deliver twice the system performance and reduce energy consumption by more than 70%, while not requiring any hardware or software changes to the rest of the system.[51]
HBM4
[edit]In July 2024, JEDEC announced its preliminary specifications for future HBM4.[52] It lowered data rate per pin back to 6.4 Gbit/s/pin (the level of HBM3) but since it now employs a 2048-bit interface per stack (doubling that of the previous generations), it still achieves greater (1.6TB/s)[53] data rate per stack than that of HBM3E. Additionally, it will allow 4GB layers (yielding 64GB in 16-layer configurations).
In April 2025, JEDEC released the official HBM4 specification.[9] According to Kunal Khullar of Tom's Hardware, it supports transfer speeds of up to 8 Gb/s across a 2048-bit interface, with total bandwidth of up to 2 TB/s, and stack height of 4 to 16, with DRAM die densities of 24Gb or 32Gb, allowing for capacities up to 64GB. HBM4 is backwards compatible to HBM3 controllers. Samsung, Micron, and SK hynix contributed to the standard's formulation.[10]
History
[edit]Background
[edit]Die-stacked memory was initially commercialized in the flash memory industry. Toshiba introduced a NAND flash memory chip with eight stacked dies in April 2007,[54] followed by Hynix Semiconductor introducing a NAND flash chip with 24 stacked dies in September 2007.[55]
3D-stacked random-access memory (RAM) using through-silicon via (TSV) technology was commercialized by Elpida Memory, which developed the first 8 GB DRAM chip (stacked with four DDR3 SDRAM dies) in September 2009, and released it in June 2011. In 2011, SK Hynix introduced 16 GB DDR3 memory (40 nm class) using TSV technology,[3] Samsung Electronics introduced 3D-stacked 32 GB DDR3 (30 nm class) based on TSV in September, and then Samsung and Micron Technology announced TSV-based Hybrid Memory Cube (HMC) technology in October.[56]
JEDEC first released the JESD229 standard for Wide IO memory,[57] the predecessor of HBM featuring four 128 bit channels with single data rate clocking, in December 2011 after several years of work. The first HBM standard JESD235 followed in October 2013.
Development
[edit]
The development of High Bandwidth Memory began at AMD in 2008 to solve the problem of ever-increasing power usage and form factor of computer memory. Over the next several years, AMD developed procedures to solve die-stacking problems with a team led by Senior AMD Fellow Bryan Black.[58] To help AMD realize their vision of HBM, they enlisted partners from the memory industry, particularly Korean company SK Hynix,[58] which had prior experience with 3D-stacked memory,[3][55] as well as partners from the interposer industry (Taiwanese company UMC) and packaging industry (Amkor Technology and ASE).[58]
The development of HBM was completed in 2013, when SK Hynix built the first HBM memory chip.[3] HBM was adopted as industry standard JESD235 by JEDEC in October 2013, following a proposal by AMD and SK Hynix in 2010.[6] High volume manufacturing began at a Hynix facility in Icheon, South Korea, in 2015.
The first GPU utilizing HBM was the AMD Fiji which was released in June 2015 powering the AMD Radeon R9 Fury X.[4][59][60]
In January 2016, Samsung Electronics began early mass production of HBM2.[20][21] The same month, HBM2 was accepted by JEDEC as standard JESD235a.[7] The first GPU chip utilizing HBM2 is the Nvidia Tesla P100 which was officially announced in April 2016.[61][62]
In June 2016, Intel released a family of Xeon Phi processors with 8 stacks of HCDRAM, Micron's version of HBM. At Hot Chips in August 2016, both Samsung and Hynix announced a new generation HBM memory technologies.[63][64] Both companies announced high performance products expected to have increased density, increased bandwidth, and lower power consumption. Samsung also announced a lower-cost version of HBM under development targeting mass markets. Removing the buffer die and decreasing the number of TSVs lowers cost, though at the expense of a decreased overall bandwidth (200 GB/s).
Nvidia announced Nvidia Hopper H100 GPU, the world's first GPU utilizing HBM3 on March 22, 2022.[65]
See also
[edit]- Stacked DRAM
- eDRAM
- Chip stack multi-chip module
- Hybrid Memory Cube (HMC): stacked memory standard from Micron Technology (2011)
References
[edit]- ^ Shilov, Anton (December 30, 2020). "Intel Confirms On-Package HBM Memory Support for Sapphire Rapids". Tom's Hardware. Retrieved January 1, 2021.
- ^ ISSCC 2014 Trends Archived 2015-02-06 at the Wayback Machine page 118 "High-Bandwidth DRAM"
- ^ a b c d "History: 2010s". SK Hynix. Retrieved 7 March 2023.
- ^ a b Smith, Ryan (2 July 2015). "The AMD Radeon R9 Fury X Review". Anandtech. Archived from the original on July 3, 2015. Retrieved 1 August 2016.
- ^ Morgan, Timothy Prickett (March 25, 2014). "Future Nvidia 'Pascal' GPUs Pack 3D Memory, Homegrown Interconnect". EnterpriseTech. Retrieved 26 August 2014.
Nvidia will be adopting the High Bandwidth Memory (HBM) variant of stacked DRAM that was developed by AMD and Hynix
- ^ a b High Bandwidth Memory (HBM) DRAM (JESD235), JEDEC, October 2013
- ^ a b "JESD235a: High Bandwidth Memory 2". 2016-01-12.
- ^ a b "JEDEC Publishes HBM3 Update to High Bandwidth Memory (HBM) Standard". JEDEC (Press release). Arlington, VA. January 27, 2022. Retrieved December 11, 2022.
- ^ a b "High Bandwidth Memory (HBM4) DRAM | JEDEC". www.jedec.org. Retrieved 2025-08-17.
- ^ a b Kunal Khullar (2025-04-17). "JEDEC finalizes HBM4 memory standard with major bandwidth and efficiency upgrades". Tom's Hardware. Retrieved 2025-08-17.
- ^ HBM: Memory Solution for Bandwidth-Hungry Processors Archived 2015-04-24 at the Wayback Machine, Joonyoung Kim and Younsu Kim, SK Hynix // Hot Chips 26, August 2014
- ^ Sohn et.al. (Samsung) (January 2017). "A 1.2 V 20 nm 307 GB/s HBM DRAM With At-Speed Wafer-Level IO Test Scheme and Adaptive Refresh Considering Temperature Distribution". IEEE Journal of Solid-State Circuits. 52 (1): 250–260. Bibcode:2017IJSSC..52..250S. doi:10.1109/JSSC.2016.2602221. S2CID 207783774.
- ^ "What's Next for High Bandwidth Memory". 17 December 2019.
- ^ "Interposers".
- ^ Where Are DRAM Interfaces Headed? Archived 2018-06-15 at the Wayback Machine // EETimes, 4/18/2014 "The Hybrid Memory Cube (HMC) and a competing technology called High-Bandwidth Memory (HBM) are aimed at computing and networking applications. These approaches stack multiple DRAM chips atop a logic chip."
- ^ Highlights of the HighBandwidth Memory (HBM) Standard Archived 2014-12-13 at the Wayback Machine. Mike O'Connor, Sr. Research Scientist, NVidia // The Memory Forum – June 14, 2014
- ^ Smith, Ryan (19 May 2015). "AMD Dives Deep On High Bandwidth Memory – What Will HBM Bring to AMD?". Anandtech. Archived from the original on August 28, 2015. Retrieved 12 May 2017.
- ^ "High-Bandwidth Memory (HBM)" (PDF). AMD. 2015-01-01. Retrieved 2016-08-10.
- ^ Valich, Theo (2015-11-16). "NVIDIA Unveils Pascal GPU: 16GB of memory, 1TB/s Bandwidth". VR World. Archived from the original on 2019-07-14. Retrieved 2016-01-24.
- ^ a b "Samsung Begins Mass Producing World's Fastest DRAM – Based on Newest High Bandwidth Memory (HBM) Interface". news.samsung.com.
- ^ a b "Samsung announces mass production of next-generation HBM2 memory – ExtremeTech". 19 January 2016.
- ^ Shilov, Anton (1 August 2016). "SK Hynix Adds HBM2 to Catalog". Anandtech. Archived from the original on August 2, 2016. Retrieved 1 August 2016.
- ^ "JEDEC Updates Groundbreaking High Bandwidth Memory (HBM) Standard" (Press release). JEDEC. 2018-12-17. Retrieved 2018-12-18.
- ^ "Samsung Electronics Introduces New High Bandwidth Memory Technology Tailored to Data Centers, Graphic Applications, and AI | Samsung Semiconductor Global Website". www.samsung.com. Retrieved 2019-08-22.
- ^ "SK Hynix Develops World's Fastest High Bandwidth Memory, HBM2E". www.skhynix.com. August 12, 2019. Archived from the original on 2019-12-03. Retrieved 2019-08-22.
- ^ "SK Hynix Announces its HBM2E Memory Products, 460 GB/S and 16GB per Stack". 12 August 2019.
- ^ "SK hynix Starts Mass-Production of High-Speed DRAM, "HBM2E"". 2 July 2020.
- ^ "Samsung 3D Stacks 12-Layer HBM2E Chips, Now 24GB of Capacity". 7 October 2019.
- ^ "Micron reveals HBMnext, a successor to HBM2e". VideoCardz. August 14, 2020. Retrieved December 11, 2022.
- ^ Hill, Brandon (August 14, 2020). "Micron Announces HBMnext as Eventual Replacement for HBM2e in High-End GPUs". HotHardware. Retrieved December 11, 2022.
- ^ Hruska, Joel (August 14, 2020). "Micron Introduces HBMnext, GDDR6X, Confirms RTX 3090". ExtremeTech. Retrieved December 11, 2022.
- ^ Garreffa, Anthony (August 14, 2020). "Micron unveils HBMnext, the successor to HBM2e for next-next-gen GPUs". TweakTown. Retrieved December 11, 2022.
- ^ "SK Hynix expects HBM3 memory with 665 GB/s bandwidth".
- ^ Shilov, Anton (June 9, 2021). "HBM3 to Top 665 GBPS Bandwidth per Chip, SK Hynix Says". Tom's Hardware. Retrieved December 11, 2022.
- ^ a b Smith, Ryan (October 20, 2021). "SK Hynix Announces Its First HBM3 Memory: 24GB Stacks, Clocked at up to 6.4Gbps". AnandTech. Archived from the original on October 20, 2021. Retrieved October 22, 2021.
- ^ a b Shilov, Anton (October 20, 2021). "SK Hynix Develops HBM3 DRAMs: 24GB at 6.4 GT/s over a 1024-Bit Bus". Tom's Hardware. Retrieved October 22, 2021.
- ^ a b Mellor, Chris (October 20, 2021). "SK hynix rolls out 819GB/s HBM3 DRAM". The Register. Retrieved October 24, 2021.
- ^ Prickett Morgan, Timothy (April 6, 2022). "The HBM3 roadmap is just getting started". The Next Platform. Retrieved May 4, 2022.
- ^ "SK hynix to Supply Industry's First HBM3 DRAM to NVIDIA". SK Hynix. June 8, 2022. Retrieved December 11, 2022.
- ^ Robinson, Cliff (August 22, 2022). "NVIDIA H100 Hopper Details at HC34 as it Waits for Next-Gen CPUs". ServeTheHome. Retrieved December 11, 2022.
- ^ "SK hynix Enters Industry's First Compatibility Validation Process for 1bnm DDR5 Server DRAM". 30 May 2023.
- ^ "HBM3 Memory HBM3 Gen2". 26 July 2023.
- ^ Bonshor, Gavin (8 August 2023). "NVIDIA Unveils Updated GH200 'Grace Hopper' Superchip with HBM3e Memory, Shipping in Q2'2024". AnandTech. Archived from the original on August 8, 2023. Retrieved 9 August 2023.
- ^ "Samsung To Launch HBM3P Memory, Codenamed "Snowbolt" With Up To 5 TB/s Bandwith [sic] Per Stack". Wccftech. 4 May 2023. Retrieved 21 August 2023.
- ^ "Samsung Electronics Holds Memory Tech Day 2023 Unveiling New Innovations to Lead the Hyperscale AI Era".
- ^ "Micron Commences Volume Production of Industry-Leading HBM3E Solution to Accelerate the Growth of AI". investors.micron.com. 2024-02-26. Retrieved 2024-06-07.
- ^ Jarred Walton (2024-03-18). "Nvidia's next-gen AI GPU is 4X faster than Hopper: Blackwell B200 GPU delivers up to 20 petaflops of compute and other massive improvements". Tom's Hardware. Retrieved 2024-03-19.
- ^ https://news.skhynix.com/sk-hynix-begins-volume-production-of-industry-first-hbm3e/ [bare URL]
- ^ https://news.skhynix.com/sk-hynix-begins-volume-production-of-the-world-first-12-layer-hbm3e/
- ^ https://pulse.mk.co.kr/m/news/english/11159678
- ^ "Samsung Develops Industry's First High Bandwidth Memory with AI Processing Power".
- ^ https://www.jedec.org/news/pressreleases/jedec-approaches-finalization-hbm4-standard-eyes-future-innovations
- ^ Multiplying 6.4 gigatransfers per second by the number of bits 2048 then dividing by 8 to convert from bits to bytes
- ^ "TOSHIBA COMMERCIALIZES INDUSTRY'S HIGHEST CAPACITY EMBEDDED NAND FLASH MEMORY FOR MOBILE CONSUMER PRODUCTS". Toshiba. April 17, 2007. Archived from the original on November 23, 2010. Retrieved 23 November 2010.
- ^ a b "Hynix Surprises NAND Chip Industry". Korea Times. 5 September 2007. Retrieved 8 July 2019.
- ^ Kada, Morihiro (2015). "Research and Development History of Three-Dimensional Integration Technology". Three-Dimensional Integration of Semiconductors: Processing, Materials, and Applications. Springer. pp. 15–8. ISBN 9783319186757.
- ^ "WIDE I/O SINGLE DATA RATE (WIDE I/O SDR) standard JESD229" (PDF).
- ^ a b c High-Bandwidth Memory (HBM) from AMD: Making Beautiful Memory, AMD
- ^ Smith, Ryan (19 May 2015). "AMD HBM Deep Dive". Anandtech. Archived from the original on August 28, 2015. Retrieved 1 August 2016.
- ^ [1] AMD Ushers in a New Era of PC Gaming including World's First Graphics Family with Revolutionary HBM Technology
- ^ Smith, Ryan (5 April 2016). "Nvidia announces Tesla P100 Accelerator". Anandtech. Archived from the original on April 6, 2016. Retrieved 1 August 2016.
- ^ "NVIDIA Tesla P100: The Most Advanced Data Center GPU Ever Built". www.nvidia.com.
- ^ Smith, Ryan (23 August 2016). "Hot Chips 2016: Memory Vendors Discuss Ideas for Future Memory Tech – DDR5, Cheap HBM & More". Anandtech. Archived from the original on August 24, 2016. Retrieved 23 August 2016.
- ^ Walton, Mark (23 August 2016). "HBM3: Cheaper, up to 64GB on-package, and terabytes-per-second bandwidth". Ars Technica. Retrieved 23 August 2016.
- ^ "NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing".
External links
[edit]- High Bandwidth Memory (HBM) DRAM (JESD235), JEDEC, October 2013
- Lee, Dong Uk; Kim, Kyung Whan; Kim, Kwan Weon; Kim, Hongjung; Kim, Ju Young; et al. (9–13 Feb 2014). "25.2 a 1.2V 8 Gb 8-channel 128 GB/S high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV". 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE (published 6 March 2014). pp. 432–433. doi:10.1109/ISSCC.2014.6757501. ISBN 978-1-4799-0920-9. S2CID 40185587.
- HBM vs HBM2 vs GDDR5 vs GDDR5X Memory Comparison
High Bandwidth Memory
View on GrokipediaOverview
Definition and Purpose
High Bandwidth Memory (HBM) is a high-speed memory interface standard for 3D-stacked synchronous dynamic random-access memory (SDRAM), designed to deliver exceptional data throughput in performance-critical systems.[19] Developed as a collaborative effort among industry leaders, HBM integrates multiple DRAM dies vertically using through-silicon vias (TSVs) to form compact stacks, enabling a wide interface that connects directly to processors via interposers.[2] This architecture was formalized by the JEDEC Solid State Technology Association in October 2013 through the JESD235 standard, aiming to overcome the bandwidth constraints of conventional memory technologies amid escalating demands from compute-intensive applications.[2] The primary purpose of HBM is to alleviate the memory bandwidth bottleneck in traditional DRAM configurations, where narrow buses and longer signal paths limit data transfer rates for parallel processing tasks.[19] By providing ultra-high data rates—reaching up to terabytes per second—HBM supports workloads such as graphics rendering, machine learning inference, and scientific simulations that require massive parallel data access.[2] It is particularly suited for graphics processing units (GPUs) and specialized accelerators, where rapid data movement between memory and compute cores is essential for maintaining efficiency in high-performance computing environments.[19] At its core, the 3D stacking approach in HBM minimizes latency by shortening interconnect distances between memory layers and the host die, while simultaneously boosting density to pack more capacity into a smaller footprint without increasing the overall system size.[2] This vertical integration contrasts with planar memory layouts, allowing for wider channels that enhance throughput without relying solely on transistor scaling. The 2013 JEDEC standardization was motivated by the need to extend bandwidth growth beyond the limitations of Moore's Law in traditional semiconductor scaling, fostering innovations in die-stacking to meet the evolving requirements of GPUs and accelerators in data-parallel applications.[2]Key Features and Benefits
High Bandwidth Memory (HBM) employs a wide bus interface, typically featuring 1024-bit channels in earlier generations and up to 2048-bit channels in advanced variants, enabling significantly higher data throughput compared to narrower bus architectures like those in traditional DRAM.[20] This design is facilitated by through-silicon vias (TSVs), which provide high-density vertical interconnects between stacked DRAM dies, minimizing signal path lengths and supporting efficient 3D integration.[21] Additionally, HBM incorporates a base logic die that handles functions such as test logic and can integrate error correction mechanisms, enhancing reliability in high-performance environments.[21] The primary benefits of HBM stem from its architecture, delivering up to 1-2 TB/s of bandwidth per stack, which represents 2-5 times the performance of GDDR6 in comparable GPU configurations.[20][2] This elevated bandwidth supports demanding applications like AI training and high-performance computing by reducing memory bottlenecks. Power efficiency is another key advantage, with energy consumption around 4-5 pJ/bit for transfers, lower than conventional graphics memories due to reduced capacitance and optimized signaling.[22] HBM's scalability allows for multi-stack configurations, enabling systems to aggregate bandwidth across up to eight stacks for total throughputs exceeding 10 TB/s while maintaining a compact footprint.[20][2] Packaging efficiency in HBM is achieved through the use of silicon interposers in 2.5D assemblies, which facilitate direct, high-speed connections between the memory stack and logic dies, and emerging hybrid bonding techniques that enable bumpless, fine-pitch interconnections for improved density and thermal management.[21][20] However, HBM incurs a significantly higher cost per bit than standard DDR DRAM due to its complex manufacturing, though this premium is justified for bandwidth-intensive, premium applications where space and power savings outweigh the expense.[21][2]Architecture
Stacked Design and Components
High Bandwidth Memory (HBM) employs a vertical stacking architecture to integrate multiple dynamic random-access memory (DRAM) dies, ranging from 4 layers in early generations to up to 16 layers in HBM4, depending on the generation and capacity requirements, atop a base logic die within a compact 3D integrated circuit (IC) package.[23][24] These DRAM dies are interconnected using through-silicon vias (TSVs), which provide high-density vertical electrical pathways, with approximately 5,000 TSVs per layer handling signals, power, and ground distribution.[24] The base logic die, positioned at the bottom of the stack, serves as a buffer for data interfacing with the host processor and supports error-correcting code (ECC) functionality through dedicated parity bits, while optional integration of controller logic can be incorporated to manage memory operations.[24][21] The stacking relies on micro-bump connections, featuring arrays of up to 6,303 bumps with a 55 μm pitch, to ensure reliable interlayer bonding and signal integrity between dies.[24] For off-chip connectivity, the HBM stack mounts onto a silicon interposer in a 2.5D/3D IC packaging configuration, which routes high-speed signals to the processor while minimizing latency and enabling dense integration.[21][25] This design achieves high memory density, with capacities scaling up to 64 GB per stack in HBM4 (as of 2025) through increased die layers and larger per-die capacities. The approximate density scaling follows the relation , where is total stack density, is the number of DRAM dies, and is the capacity per die; however, thermal dissipation constraints limit to 12–16 to prevent overheating within the fixed stack height of around 720–775 μm.[26][23] In TSV fabrication, dielectric liners isolate the copper-filled vias, with advanced processes incorporating high-k materials to reduce parasitic capacitance and improve electrical performance across the stack.[27] Thermal management is addressed through integrated heat spreaders and thermal vias or dummy bumps, which distribute heat evenly from the densely packed dies to the package lid, mitigating hotspots that could degrade reliability.[28][29] Yield challenges in stacking arise from defect propagation across layers, necessitating known good die (KGD) testing at interim stages to verify functionality before assembly, achieving yields above 98% in mature processes.[21][30] In HBM4, the base die can be customized for advanced features like integrated power management and UCIe interfaces, while hybrid bonding may replace micro-bumps for pitches below 10 μm in future implementations.[31]Interface and Data Transfer
High Bandwidth Memory (HBM) employs a wide interface architecture standardized by JEDEC, featuring a data bus of 1024 bits in HBM1-HBM3 (divided into 8 channels of 128 bits or 16 channels of 64 bits) and 2048 bits in HBM4 (32 channels), with each channel supporting 128-bit or narrower sub-divisions depending on the generation. This design utilizes single-ended signaling augmented by a reference voltage (VREF) for pseudo-differential operation, which enhances noise rejection while minimizing pin count and power. Receivers incorporate PVT-tolerant techniques, such as adaptive equalization and voltage referencing, to maintain signal integrity across process variations, supply voltage fluctuations, and temperature extremes.[5] The data transfer protocol in HBM separates the command and address buses, with dedicated row address (RA) and column address (CA) lines that allow simultaneous issuance of row activation and column access commands for improved efficiency. Burst length is 2 clock cycles (BL2), transferring 256 bits per 128-bit channel (or 128 bits per 64-bit channel in HBM3) in a single burst to optimize throughput for high-demand workloads. Refresh operations are tailored for the stacked die structure, supporting per-bank or targeted refresh modes that reduce overhead compared to all-bank refreshes in traditional DRAM, thereby preserving availability in multi-die configurations.[32][33] Bandwidth in HBM is determined by the formula: This equation converts the aggregate bit-rate to bytes per second, where the division by 8 accounts for 8 bits per byte; for instance, a 2 Gbps per pin rate across 1024 pins (HBM1-HBM3) yields 256 GB/s, or across 2048 pins (HBM4) yields 512 GB/s.[20] To ensure signal integrity over the short, high-density interconnects, HBM implements on-die termination (ODT) with dynamic calibration, applying resistive termination at the receiver to match driver impedance and suppress reflections. Timing benefits from direct die-to-die paths via through-silicon vias (TSVs), enabling low-latency intra-stack operations with typical access latencies around 100 ns, benefiting from short die-to-die paths. The stacked design's proximity enables these low-latency paths.[34]Generations
HBM1
High Bandwidth Memory 1 (HBM1) represents the first generation of the HBM standard, formalized by the Joint Electron Device Engineering Council (JEDEC) under JESD235 in October 2013.[35] This specification introduced a high-performance DRAM architecture designed for applications requiring substantial data throughput, such as graphics processing units (GPUs). HBM1 stacks utilized through-silicon vias (TSVs) to interconnect multiple DRAM dies vertically, enabling a compact form factor with enhanced bandwidth compared to traditional planar DRAM configurations. The initial commercial production of HBM1 was achieved by SK Hynix in 2013, marking the debut of TSV-based stacking in mass-produced DRAM devices.[4] The core specifications of HBM1 include a maximum stack capacity of 1 GB, achieved through a 4-high configuration of 2 Gbit dies (each contributing 256 MB).[4] Each stack features eight independent 128-bit channels, supporting data transfer rates of up to 1 Gbps per pin. This results in a total bandwidth of approximately 128 GB/s per stack, calculated as 16 GB/s per channel across the eight channels (128 bits × 1 GT/s × 8 channels). The interface employs a wide I/O design with differential clocking to facilitate low-power, high-speed operation, while the 2-channel per die layout optimizes inter-die communication via TSVs. HBM1's integration was first demonstrated in AMD's Fiji GPU architecture, released in 2015, where four 1 GB stacks provided 512 GB/s aggregate bandwidth for high-end graphics workloads.[36][35] At the channel level, HBM1 employs eight pseudo-channels per stack to manage bank access and interleaving, allowing independent addressing within each 128-bit sub-channel for improved parallelism. Error handling is limited to basic on-die detection mechanisms for single-bit faults and post-package repair capabilities, without support for full error-correcting code (ECC) to maintain simplicity and cost efficiency in the initial design. This architecture prioritizes bandwidth density over extensive redundancy, relying on TSVs for vertical integration that reduces signal latency but introduces challenges in thermal management and alignment precision.[35] Despite its innovations, HBM1 faced limitations in density, capping at 1 GB per stack, which constrained scalability for emerging memory-intensive applications relative to subsequent generations. Bandwidth was also modest at 128 GB/s per stack, insufficient for the escalating demands of later high-performance computing scenarios. Manufacturing complexity arose from the novel TSV processes and 3D stacking, leading to initial yield issues due to defects in via alignment and die bonding, which elevated production costs and limited early adoption.[36][37]HBM2 and HBM2E
High Bandwidth Memory 2 (HBM2) represents the second generation of the HBM standard, standardized by JEDEC in January 2016 under JESD235A.[38] It builds on HBM1 by doubling the per-pin data rate to 2 Gbps while maintaining a 1024-bit wide interface divided into up to 8 independent 128-bit channels per stack.[38] This configuration supports stack heights of 2 to 8 DRAM dies, with die densities from 1 Gb to 8 Gb, enabling capacities up to 8 GB per stack in an 8-high configuration.[38] The resulting peak bandwidth reaches 256 GB/s per stack, calculated as the product of the pin speed, interface width, and channel count divided by 8 to convert bits to bytes.[38] In contrast to HBM1's 1 Gbps per pin and maximum 128 GB/s per stack, HBM2's formula for bandwidth scaling is: where pin_speed is in Gbps and channels range from 2 to 8, yielding up to twice the throughput of its predecessor for equivalent configurations.[38] HBM2 also introduces full error-correcting code (ECC) support per channel for improved data integrity in high-reliability applications.[39] Key enhancements in HBM2 focus on increased pin speeds achieved through advanced signaling techniques, such as pseudo-open drain I/O to reduce power consumption and improve signal integrity at higher rates.[40] It supports flexible channel configurations from 2 to 8, allowing scalability for diverse system needs, and operates at a core voltage of 1.2 V with I/O signaling optimized for efficiency, contributing to overall power gains over HBM1 despite the speed increase.[32] These improvements enable HBM2 to deliver higher performance in bandwidth-intensive workloads while maintaining low latency and energy efficiency. HBM2E emerged as an evolutionary extension of HBM2 in 2019, driven by industry demands for greater capacity and speed without a full generational shift.[41] It boosts per-pin data rates to 3.6–6.4 Gbps through refined manufacturing and signaling, supporting up to 12-high stacks with up to 16 Gb dies (2 GB each) for capacities reaching 24 GB per stack.[42] Bandwidth scales accordingly to up to 460 GB/s per stack at 3.6 Gbps, with higher rates possible in optimized implementations.[41] Notable deployments include the NVIDIA A100 GPU, which utilizes HBM2E for 40–80 GB total memory and over 2 TB/s aggregate bandwidth across multiple stacks, and the AMD Instinct MI250 accelerator with 128 GB HBM2E delivering 3.2 TB/s.[43][44] HBM2E retains HBM2's ECC capabilities and channel flexibility, prioritizing seamless integration into existing HBM2 ecosystems for accelerated computing and AI systems.[45]HBM3 and HBM3E
High Bandwidth Memory 3 (HBM3) represents the third generation of the HBM standard, finalized by JEDEC in January 2022 to address escalating demands for bandwidth in high-performance computing and artificial intelligence applications.[46] This iteration doubles the channel count to 16 channels (each 64 bits wide) for a 1024-bit interface per stack while supporting densities up to 24 GB in a 12-high configuration using 16 Gb DRAM layers.[5] The base data rate operates at 6.4 Gbps per pin, delivering a peak bandwidth of up to 819 GB/s per stack, which significantly enhances data throughput for memory-intensive workloads.[47] HBM3E serves as an energy-efficient extension to the HBM3 specification, with initial rollouts occurring in 2023 and broader adoption in 2024, pushing per-pin speeds to 9.2–9.6 Gbps for improved performance without proportionally increasing power consumption.[20] This variant achieves up to 1.2 TB/s bandwidth per stack and supports capacities reaching 36 GB, leveraging higher-density DRAM dies in multi-layer stacks.[48] It has been integrated into advanced accelerators, such as NVIDIA's H200 GPU with 141 GB of HBM3E memory and AMD's Instinct MI325X with 256 GB capacity and 6 TB/s aggregate bandwidth, reflecting 2025 updates in AI hardware ecosystems.[49][50] Key enhancements in HBM3 and HBM3E include adaptive refresh mechanisms, which dynamically adjust refresh intervals to reduce power usage during low-activity periods, and on-die error correction code (ECC) for improved reliability by detecting and correcting single-bit errors directly within the DRAM layers.[51] Additionally, support for multi-stack daisy-chaining allows seamless interconnection of multiple HBM stacks, facilitating scalable configurations in large-scale systems without excessive signaling overhead.[52] In practical operation, the effective throughput of HBM3 and HBM3E accounts for protocol and timing overheads, typically expressed as: \text{Effective throughput} = \text{base_BW} \times \text{efficiency_factor ($0.9$–$0.95$)} where base_BW is the theoretical peak bandwidth and the efficiency factor reflects real-world utilization, often around 85–95% in optimized AI training scenarios.[53]Advanced Variants
High Bandwidth Memory (HBM) has seen innovative extensions through processing-in-memory (PIM) architectures, which integrate compute units directly into the memory stack to minimize data movement between processors and memory. Samsung developed HBM-PIM prototypes in 2023, embedding AI-dedicated processors within the HBM DRAM to offload operations like matrix multiplications, achieving up to 2x speedup in AI inference tasks such as GPT-J models.[54][55] SK Hynix has similarly advanced PIM technologies since 2022, focusing on domain-specific memory for AI clusters.[56] These variants reduce energy consumption by performing computations locally in memory; conceptually, the energy savings can be modeled as , where compute locality represents the fraction of operations executed in-memory, leading to reported reductions of up to 85% in data movement energy for transformer-based AI workloads. The next major advancement, HBM4, was standardized by JEDEC in April 2025 under JESD270-4, with development completed by major vendors such as SK Hynix in September 2025 and samples supplied to customers like NVIDIA; mass production is anticipated in 2026.[57][58][59] SK Hynix is the primary supplier of HBM for NVIDIA's high-end AI GPUs, expected to hold approximately 70% market share for HBM4 in 2026. Micron serves as a significant secondary supplier with around 11% overall HBM market share in Q3 2025, while Samsung, holding about 35% share in the same period, is competing for larger market shares in NVIDIA's HBM4 contracts.[60][61] It supports stack configurations up to 16-high using 24 Gb or 32 Gb DRAM dies for capacities reaching 64 GB per stack.[57][59] It delivers over 2 TB/s bandwidth per stack via a 2048-bit interface at 8 Gbps per pin, with vendors like SK Hynix targeting over 10 Gbps for enhanced AI and high-performance computing applications.[62][63] HBM4 incorporates hybrid bonding for finer interconnect pitches, enabling tighter integration with compute dies and reduced latency compared to prior generations.[64] HBM4E represents an enhanced variant of HBM4, developed by manufacturers including Samsung, Micron, and SK Hynix to meet the escalating demands of AI and high-performance computing systems.[7][65][66] It achieves per-pin data rates up to 13 Gbps, delivering bandwidths of up to 3.25 TB/s per stack, which is approximately 2.5 times higher than HBM3E.[7][67] These advancements leverage hybrid bonding and other refined interconnect technologies for improved efficiency and integration.[68] Mass production of HBM4E is anticipated starting in 2027, with samples already being supplied to partners like NVIDIA for next-generation accelerators.[67][66] Emerging variants extend HBM's utility in disaggregated systems through integration with Compute Express Link (CXL), allowing pooled HBM resources across servers for flexible memory allocation in AI clusters, as demonstrated in Samsung's 2023 prototypes combining HBM-PIM with CXL for up to 1.1 TB/s bandwidth and 512 GB capacity.[55] Additionally, evolutions in 2.5D packaging, including advanced silicon interposers and hybrid bonding, support higher-density HBM stacks with improved thermal management and signal integrity for next-generation AI accelerators.[69][70]Historical Development
Origins and Background
The development of High Bandwidth Memory (HBM) originated in the 2000s from research on three-dimensional integrated circuits (3D ICs), spearheaded by initiatives from the Defense Advanced Research Projects Agency (DARPA) and academic institutions, aimed at overcoming the "memory wall" in von Neumann architectures. This memory wall, first articulated by Wulf and McKee, describes the widening gap where processor computational speeds have outpaced memory access latencies and bandwidth improvements by factors of 50 to 100, creating a bottleneck in data-intensive applications.[71][72] 3D IC research focused on vertically stacking components to shorten interconnects, reduce latency, and enhance bandwidth density, with early explorations dating back to DARPA-funded programs on heterogeneous integration in the early 2000s. Key early concepts for HBM's stacked architecture emerged from academic and industry papers in the mid-2000s, including IEEE publications proposing vertical interconnections for chip stacks to enable wider data paths and higher throughput in memory systems. For instance, a 2004 IEEE paper detailed process integration techniques for 3D chip stacks using through-silicon vias (TSVs) to facilitate dense vertical signaling, laying foundational ideas for memory-logic integration. Initial prototypes of stacked DRAM with wide interfaces, such as Samsung's Wide-I/O mobile DRAM, were demonstrated around 2011, building on these concepts to achieve preliminary high-bandwidth performance in lab settings.[73][74][2] Driving this evolution were the escalating memory demands of GPU advancements post-2010, as NVIDIA and AMD pushed architectures like Fermi and subsequent generations that amplified parallel compute but strained traditional GDDR memory's bandwidth limits in high-end graphics and emerging compute workloads. Power efficiency constraints in data centers further necessitated innovations like 3D stacking, as conventional memory interfaces consumed excessive energy for scaling bandwidth beyond 10 GB/s per channel. Precursor standards, such as the Wide I/O interface developed under JEDEC with input from the MIPI Alliance, provided early frameworks for low-power, wide-channel 3D memory suitable for mobile and high-performance applications.[75][76][77] In response to GDDR's limitations in power and scalability for ultra-high-end graphics, AMD collaborated closely with SK Hynix starting in 2013 to pioneer HBM as a next-generation solution, emphasizing 3D stacking to deliver terabit-per-second bandwidth while maintaining compact form factors. SK Hynix has strategically focused on leadership in AI memory technology through its development of advanced HBM variants, essential for AI GPU accelerators.[78] This industry partnership addressed the need for memory that could keep pace with GPU compute scaling without exacerbating data center energy demands. Samsung later contributed to HBM evolution through JEDEC standardization and HBM2 production.[79][80][81]Standardization and Milestones
The standardization of High Bandwidth Memory (HBM) was spearheaded by the Joint Electron Device Engineering Council (JEDEC), which published the initial JESD235 specification in October 2013 to define the architecture and interface for HBM1.[82] Key semiconductor manufacturers, including Samsung, SK Hynix, and Micron, contributed significantly to the development of this standard through their participation in JEDEC committees, ensuring compatibility across industry ecosystems.[83][57] In January 2016, JEDEC released the updated JESD235A specification for HBM2, which enhanced data rates and capacity while maintaining backward compatibility with the original framework.[38] The JESD238 standard for HBM3 followed in January 2022, introducing higher pin speeds up to 6.4 Gbps and support for up to 16 channels to meet escalating bandwidth demands in high-performance computing.[84][47] A major milestone in HBM's adoption occurred in June 2015 with the launch of the AMD Radeon R9 Fury X graphics card, the first commercial product to integrate HBM1, delivering 512 GB/s of bandwidth in a 4 GB stack.[85] NVIDIA advanced this trajectory in 2017 by incorporating HBM2 into its Tesla V100 accelerator based on the Volta architecture, enabling 900 GB/s bandwidth for data center applications.[86] In 2019, vendors like Samsung and SK Hynix introduced HBM2E as a non-JEDEC extension, boosting per-pin speeds to 3.6 Gbps and capacities up to 24 GB per stack to bridge gaps until full HBM3 ratification.[45] HBM3E sampling began in 2023, with SK Hynix unveiling 8 Gbps/pin modules in May and Micron following with 24 GB 8-high stacks for NVIDIA's H200 GPUs.[87][48] The AI boom from 2023 to 2025 propelled HBM's market growth, with the total addressable market expanding from approximately $4 billion in 2023 to an estimated $35 billion in 2025, according to Micron's forecasts.[88] This surge led to supply shortages in 2024 and 2025, as demand outpaced production; for instance, SK Hynix reported its HBM supply nearly sold out for 2025 due to NVIDIA's procurement needs.[89] By 2025, HBM integration reached over 70% of top AI GPUs, driven by partnerships such as TSMC's CoWoS advanced packaging technology, which facilitates efficient stacking of HBM with GPUs from NVIDIA and AMD. In September 2025, SK Hynix completed development of the world's first HBM4, preparing for mass production to support next-generation AI systems.[90][91][92]Applications
Graphics and Gaming
High Bandwidth Memory (HBM) has seen early adoption in graphics processing units (GPUs) primarily for high-end gaming and professional visualization applications, where its stacked architecture provides superior bandwidth compared to traditional GDDR memory. AMD integrated HBM2 with its Radeon RX Vega series in 2017 to deliver up to 483 GB/s of memory bandwidth, which supported enhanced performance in demanding rendering tasks.[93][94] This was followed by the Radeon VII in 2019, featuring 16 GB of HBM2 across a 4096-bit interface for 1 TB/s bandwidth, enabling smooth 4K and 8K video playback and gaming at high frame rates in titles requiring intensive graphical computations.[95] In gaming scenarios, HBM's sustained high bandwidth excels at rapid texture loading and processing complex shaders, minimizing latency in real-time rendering pipelines. This is particularly beneficial for ray tracing workloads, where HBM facilitates quicker access to large datasets for light simulation and reflection calculations, resulting in more realistic visuals without frame drops. For virtual reality (VR) and augmented reality (AR) applications, HBM reduces memory bottlenecks during high-fidelity environment rendering, supporting immersive experiences with minimal stuttering in dynamic scenes.[96][97] NVIDIA has also leveraged HBM in professional graphics cards, such as the Quadro GP100 released in 2017, which utilized 16 GB of HBM2 for bandwidth-intensive tasks like 3D modeling and simulation in gaming development workflows.[98] Although consumer gaming GPUs have largely stuck to GDDR variants due to cost, HBM's power efficiency—achieving high throughput at lower voltages—has influenced designs akin to gaming consoles. Despite these advantages, HBM's higher manufacturing costs restrict its use to premium GPUs, primarily in flagship models for enthusiasts and professionals. This premium positioning ensures HBM targets scenarios where bandwidth demands outweigh affordability concerns, such as ultra-high-resolution gaming and content creation.AI and High-Performance Computing
High Bandwidth Memory (HBM) provides high bandwidth data transmission for AI accelerators such as GPUs and TPUs, playing a pivotal role in artificial intelligence (AI) accelerators, where its high bandwidth and capacity enable efficient handling of large-scale data for training and inference workloads. SK Hynix has established leadership in HBM technology for AI applications, dominating production of advanced versions such as HBM3E and contributing to the development of HBM4. For Nvidia's AI accelerators, major HBM suppliers include SK Hynix as the primary provider, Micron as a significant secondary supplier, and Samsung, which is in close discussions to supply larger volumes for next-generation HBM4.[11][61][99][100] In NVIDIA's Hopper architecture GPUs, such as the H100 introduced in 2023 and the H200 in 2024, HBM3 and HBM3e provide up to 141 GB of memory per GPU, supporting the processing of massive large language models (LLMs) like those exceeding 100 billion parameters without extensive model sharding.[101][102] This configuration delivers up to 4.8 TB/s of bandwidth, facilitating faster matrix multiplications critical for transformer-based architectures in LLM training.[103] Compared to prior generations using HBM2e, such as the A100, the H100 and H200 achieve 3x to 4x improvements in training throughput for LLMs due to enhanced memory access speeds and tensor core optimizations.[104] In high-performance computing (HPC), HBM integration in GPU-accelerated nodes supports exascale simulations requiring rapid data throughput for complex scientific computations. The Frontier supercomputer, deployed in 2022 at Oak Ridge National Laboratory, leverages AMD EPYC processors paired with Instinct MI250X GPUs equipped with 128 GB of HBM2e per accelerator, enabling peak performance of over 1.1 exaFLOPS for double-precision workloads.[105] This setup has powered advanced climate modeling, including the SCREAM (Spectrally coupled Community Atmosphere Model with Emphasized Array Methods) simulation, which resolved global cloud processes at kilometer-scale resolution in under a day—advancing predictions of extreme weather patterns and their U.S. impacts.[106] By 2025, HBM adoption extends to tensor processing units (TPUs) and custom application-specific integrated circuits (ASICs), addressing the demands of distributed AI paradigms like federated learning. Google's Trillium (TPU v6e), previewed in 2024 and scaling into production, doubles HBM capacity to 32 GB per chip with 1.64 TB/s bandwidth, enhancing efficiency for privacy-preserving federated training across edge devices and data centers.[107] Custom ASICs from vendors like Broadcom, integrated with HBM3e stacks, enable multi-terabyte memory pools in hyperscale clusters, reducing latency in collaborative model updates for federated scenarios.[108][109] HBM's proximity to compute logic minimizes data movement overhead in AI pipelines, lowering energy costs for memory-bound operations and enabling sustainable scaling to exaFLOPS-level performance (10^15 FLOPS).[23] The expansion of AI data centers has caused explosive demand for HBM and other high-value memory products, driven by AI advancements, straining production resources as manufacturing 1 GB of HBM requires approximately three times the silicon wafer capacity compared to standard DRAM, contributing to shortages and price increases in conventional RAM supplies.[110][111] In HPC and AI systems, this architecture supports the bandwidth needs of trillion-parameter models, ensuring efficient resource utilization as compute clusters expand toward zettascale ambitions.[15]Comparisons and Future Outlook
Versus Other Memory Technologies
High Bandwidth Memory (HBM) offers substantial advantages in bandwidth over GDDR6 and GDDR6X, primarily due to its wide interface and stacked architecture, enabling a single HBM3E stack to achieve up to 1.2 TB/s, compared to approximately 1 TB/s total bandwidth in high-end GDDR6X implementations like NVIDIA's RTX 4090 GPU. This results in 3-5x higher effective bandwidth for bandwidth-intensive workloads, though GDDR6X remains preferable for cost-sensitive gaming applications where its lower price point—about 3-5x less per GB than HBM—offsets slightly reduced peak throughput. HBM also incurs 2-3x higher latency in low-load scenarios due to its lower per-pin clock speeds, but its proximity to the processor via 2.5D integration mitigates this under sustained high utilization. In contrast to DDR5 and LPDDR5, HBM's vertical stacking yields roughly 10x greater bandwidth density, packing terabytes per second into a compact footprint that suits space-constrained high-performance systems, though it requires approximately three times more silicon wafer area per gigabyte than DDR5 due to stacking complexities and larger dies, while a typical DDR5 DIMM delivers only about 76.8 GB/s at 9.6 GT/s. DDR5 and LPDDR5, however, provide superior capacity scalability, with modules reaching up to 128 GB, and benefit from widespread adoption in consumer and server platforms for their lower cost and simpler integration. HBM's premium pricing, often 5x higher per GB, limits its use to specialized domains where bandwidth trumps volume.[112] In the context of AI-driven demand, NAND flash focuses on backend storage for AI servers (e.g., high-capacity SSDs) and enterprise/consumer applications with moderate demand growth and mid-single-digit to low double-digit price rises, whereas HBM targets explosive AI training and inference needs with significantly stronger price surges and margin expansion.[113]| Metric | HBM3E (per stack) | GDDR6X (high-end GPU total) | DDR5 (per module) |
|---|---|---|---|
| Bandwidth | 1.2 TB/s | 1 TB/s | 76.8 GB/s |
| Power Consumption | ~30 W | ~35-50 W (total for 24 chips) | ~10 W |
| Cost ($/GB) | $10-20 | $5-15 | $5-10 |
