Hardware for artificial intelligence

Specialized computer hardware is often used to execute artificial intelligence (AI) programs faster, and with less energy, such as Lisp machines, neuromorphic engineering, event cameras, and physical neural networks. Since 2017, several consumer grade CPUs and SoCs have on-die NPUs. As of 2023, the market for AI hardware is dominated by GPUs.^[1]

As of the 2020s, AI computation is dominated by graphics processing units (GPUs) and newer domain-specific accelerators such as Google’s Tensor Processing Units (TPUs), AMD’s Instinct MI300 series, and various on-device neural-processing units (NPUs) found in consumer hardware.^[2]^[3]

Scope

For the purposes of this article, AI hardware refers to computing components and systems specifically designed or optimized to accelerate artificial-intelligence workloads such as machine-learning training or inference. This includes general-purpose accelerators used for AI (for example, GPUs) and domain-specific accelerators (for example, TPUs, NPUs, and other AI ASICs).^[4]

Event-based cameras are sometimes discussed in the context of neuromorphic computing, but they are input sensors rather than AI compute devices. Conversely, components such as memristors are basic circuit elements rather than specialized AI hardware when considered alone.^[5]^[6]

Lisp machines

Lisp machines were developed in the late 1970s and early 1980s to make artificial intelligence programs written in the programming language Lisp run faster.

Dataflow architecture

Dataflow architecture processors used for AI serve various purposes with varied implementations like the polymorphic dataflow^[7] Convolution Engine^[8] by Kinara (formerly Deep Vision), structure-driven dataflow by Hailo,^[9] and dataflow scheduling by Cerebras.^[10]

Component hardware

AI accelerators

Since the 2010s, advances in computer hardware have led to more efficient methods for training deep neural networks that contain many layers of non-linear hidden units and a very large output layer.^[11] By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had displaced central processing units (CPUs) as the dominant means to train large-scale commercial cloud AI.^[12] OpenAI estimated the hardware compute used in the largest deep learning projects from Alex Net (2012) to Alpha Zero (2017), and found a 300,000-fold increase in the amount of compute needed, with a doubling-time trend of 3.4 months.^[13]^[14]

General-purpose GPUs for AI

Since the 2010s, graphics processing units (GPUs) have been widely used to train and deploy deep learning models because of their highly parallel architecture and high memory bandwidth. Modern data-center GPUs include dedicated tensor or matrix-math units that accelerate neural-network operations.

In 2022, NVIDIA introduced the Hopper-generation H100 GPU, adding FP8 precision support and faster interconnects for large-scale model training.^[15] AMD and other vendors have also developed GPUs and accelerators aimed at AI and high-performance computing workloads.^[16]

Domain-specific accelerators (ASICs / NPUs)

Beyond general-purpose GPUs, several companies have developed application-specific integrated circuits (ASICs) and neural processing units (NPUs) tailored for AI workloads. Google introduced the Tensor Processing Unit (TPU) in 2016 for deep-learning inference, with later generations supporting large-scale training through dense systolic-array designs and optical interconnects.^[17] Other vendors have released similar devices—such as Apple’s Neural Engine and various on-device NPUs—that emphasize energy-efficient inference in mobile or edge computing environments.^[18]

Memory and interconnects

AI accelerators rely on fast memory and inter-chip links to manage the large data volumes of training and inference. High-bandwidth memory (HBM) stacks, standardized as HBM3 in 2023, provide terabytes-per-second throughput on modern GPUs and ASICs.^[19] These accelerators are often connected through dedicated fabrics such as NVIDIA’s NVLink and NVSwitch or optical interconnects used in TPU systems to scale performance across thousands of chips.^[20]

Sources

^ "Nvidia: The chip maker that became an AI superpower". BBC News. 25 May 2023. Retrieved 18 June 2023.
^ "NVIDIA H100 Tensor Core GPU Architecture Whitepaper". NVIDIA. 2022. Retrieved 4 November 2025.
^ "Google Cloud TPU v5 Announcement". Google Cloud Blog. 2023. Retrieved 4 November 2025.
^ Sze, Vivienne; Chen, Yu-Hsin; Yang, Tien-Ju; Emer, Joel (2017). "Efficient Processing of Deep Neural Networks: A Tutorial and Survey". Proceedings of the IEEE. 105 (12): 2295–2329. doi:10.1109/JPROC.2017.2761740. Retrieved 4 November 2025.
^ Gallego, Guillermo (2022). "Event-based Vision: A Survey" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2020.3008413. Retrieved 4 November 2025.
^ Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). "The Missing Memristor Found". Nature. 453: 80–83. doi:10.1038/nature06932. Retrieved 4 November 2025.
^ Maxfield, Max (24 December 2020). "Say Hello to Deep Vision's Polymorphic Dataflow Architecture". Electronic Engineering Journal. Techfocus media.
^ "Kinara (formerly Deep Vision)". Kinara. 2022. Retrieved 2022-12-11.
^ "Hailo". Hailo. Retrieved 2022-12-11.
^ Lie, Sean (29 August 2022). Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning. Cerebras (Report). Archived from the original on 15 March 2024. Retrieved 13 December 2022.
^ Research, AI (23 October 2015). "Deep Neural Networks for Acoustic Modeling in Speech Recognition". AIresearch.com. Retrieved 23 October 2015.
^ Kobielus, James (27 November 2019). "GPUs Continue to Dominate the AI Accelerator Market for Now". InformationWeek. Retrieved 11 June 2020.
^ Tiernan, Ray (2019). "AI is changing the entire nature of compute". ZDNet. Retrieved 11 June 2020.
^ "AI and Compute". OpenAI. 16 May 2018. Retrieved 11 June 2020.
^ "NVIDIA H100 Tensor Core GPU Architecture". NVIDIA. 2022. Retrieved 4 November 2025.
^ "AMD Instinct MI300X Accelerator". AMD. 2024. Retrieved 4 November 2025.
^ "Introducing Cloud TPU v5p and the AI Hypercomputer". Google Cloud Blog. 6 December 2023. Retrieved 4 November 2025.
^ "Apple Neural Engine". Apple Machine Learning Research. Retrieved 4 November 2025.
^ "JESD238A: High Bandwidth Memory (HBM3) Standard". JEDEC. January 2023. Retrieved 4 November 2025.
^ "NVIDIA Hopper Architecture In-Depth". NVIDIA Developer Blog. 22 March 2022. Retrieved 4 November 2025.

[1] "Nvidia: The chip maker that became an AI superpower". BBC News. 25 May 2023. Retrieved 18 June 2023.

[2] "NVIDIA H100 Tensor Core GPU Architecture Whitepaper". NVIDIA. 2022. Retrieved 4 November 2025.

[3] "Google Cloud TPU v5 Announcement". Google Cloud Blog. 2023. Retrieved 4 November 2025.

[4] Sze, Vivienne; Chen, Yu-Hsin; Yang, Tien-Ju; Emer, Joel (2017). "Efficient Processing of Deep Neural Networks: A Tutorial and Survey". Proceedings of the IEEE. 105 (12): 2295–2329. doi:10.1109/JPROC.2017.2761740. Retrieved 4 November 2025.

[5] Gallego, Guillermo (2022). "Event-based Vision: A Survey" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2020.3008413. Retrieved 4 November 2025.

[6] Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). "The Missing Memristor Found". Nature. 453: 80–83. doi:10.1038/nature06932. Retrieved 4 November 2025.

[7] Maxfield, Max (24 December 2020). "Say Hello to Deep Vision's Polymorphic Dataflow Architecture". Electronic Engineering Journal. Techfocus media.

[8] "Kinara (formerly Deep Vision)". Kinara. 2022. Retrieved 2022-12-11.

[9] "Hailo". Hailo. Retrieved 2022-12-11.

[10] Lie, Sean (29 August 2022). Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning. Cerebras (Report). Archived from the original on 15 March 2024. Retrieved 13 December 2022.

[11] Research, AI (23 October 2015). "Deep Neural Networks for Acoustic Modeling in Speech Recognition". AIresearch.com. Retrieved 23 October 2015.

[12] Kobielus, James (27 November 2019). "GPUs Continue to Dominate the AI Accelerator Market for Now". InformationWeek. Retrieved 11 June 2020.

[13] Tiernan, Ray (2019). "AI is changing the entire nature of compute". ZDNet. Retrieved 11 June 2020.

[14] "AI and Compute". OpenAI. 16 May 2018. Retrieved 11 June 2020.

[15] "NVIDIA H100 Tensor Core GPU Architecture". NVIDIA. 2022. Retrieved 4 November 2025.

[16] "AMD Instinct MI300X Accelerator". AMD. 2024. Retrieved 4 November 2025.

[17] "Introducing Cloud TPU v5p and the AI Hypercomputer". Google Cloud Blog. 6 December 2023. Retrieved 4 November 2025.

[18] "Apple Neural Engine". Apple Machine Learning Research. Retrieved 4 November 2025.

[19] "JESD238A: High Bandwidth Memory (HBM3) Standard". JEDEC. January 2023. Retrieved 4 November 2025.

[20] "NVIDIA Hopper Architecture In-Depth". NVIDIA Developer Blog. 22 March 2022. Retrieved 4 November 2025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Version	Release Year	Key Features
TPU v1	2015 (internal), 2017 (announced)	Inference-focused; 92 TOPS (INT8); systolic array for matrix ops.^[47]
TPU v2	2017	Added training support; 180 teraFLOPS (BF16); first pods with 256 chips.^[48]
TPU v3	2018	Pod-scale with liquid cooling; 420 teraFLOPS (BF16); 8x faster than v2.^[49]
TPU v4	2021	Enhanced sparsity and optical switches; 275 teraFLOPS (BF16/INT8); 32 GiB HBM2 memory.^[50]
TPU v5e	2023	Efficiency variant; 197 teraFLOPS (BF16), 393 TOPS (INT8); 16 GiB HBM; improved perf/watt.
TPU v5p	2024	Performance variant; 459 teraFLOPS (BF16); HBM3 memory; 2x v5e throughput.
TPU v6e (Trillium)	2024	LLM-optimized; 926 teraFLOPS (BF16); advanced sparsity; pods up to 8,960 chips.
TPU v7 (Ironwood)	2025 (announced)	Inference-focused; ~4,614 teraFLOPS (FP8); 192 GiB HBM; real-time AI support.^[51]
Edge TPU	2019	Mobile/edge inference; 4 TOPS (INT8); integrated in Coral devices for IoT.

History

Hardware for artificial intelligence

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Hardware for artificial intelligence

Scope

Lisp machines

Dataflow architecture

Component hardware

AI accelerators

General-purpose GPUs for AI

Domain-specific accelerators (ASICs / NPUs)

Memory and interconnects

Sources

Hardware for artificial intelligence

Historical Developments

Lisp Machines

Dataflow Architectures

General-Purpose Hardware

Central Processing Units

Graphics Processing Units

Specialized AI Accelerators

Tensor Processing Units

Field-Programmable Gate Arrays

Application-Specific Integrated Circuits

Neuromorphic and Emerging Hardware

Spiking Neural Network Processors

Optical and Photonic Processors

Key Components and Considerations

Memory Systems

Interconnects and Networking

References

Add your contribution

Related Hubs

Contribute something