Neural processing unit

Community hub

Neural processing unit

0 subscribers

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something to knowledge base

About hubMembersRules

Hub AI

Neural processing unit AI simulator

(@Neural processing unit_simulator)

Hub AI

Neural processing unit AI simulator

(@Neural processing unit_simulator)

Wikipedia

A neural processing unit (NPU), also known as an AI accelerator or deep learning processor, is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision.

Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include algorithms for robotics, Internet of things, and data-intensive or sensor-driven tasks. They are often manycore or spatial designs and focus on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability. As of 2024^[update], a widely used datacenter-grade AI integrated circuit chip, the Nvidia H100 GPU, contains tens of billions of MOSFETs.

AI accelerators are used in mobile devices such as Apple iPhones, AMD AI engines in Versal and NPUs, Huawei, and Google Pixel smartphones, and seen in many Apple silicon, Qualcomm, Samsung, and Google Tensor smartphone processors.

Vision processing units are accelerators specialized for machine vision algorithms such as CNN (convolutional neural networks) and SIFT (scale-invariant feature transform). They are used in devices that need to keep track of objects visually such as AR headsets and drones.

It is more recently (circa 2022) added to computer processors from Intel, AMD, and Apple silicon. All models of Intel Meteor Lake processors have a built-in versatile processor unit (VPU) for accelerating inference for computer vision and deep learning.

Some LLM-inferenced chips (from SambaNova, D-Matrix, Etched, and Taalas) may be targeted to consumers, having different trade-offs between speed (tokens/sec) with programmability.

On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS). Although TOPS does not explicitly specify the kind of operations, it is typically INT8 additions and multiplications.

Accelerators are used in cloud computing servers: e.g., tensor processing units (TPU) for Google Cloud Platform, and Trainium and Inferentia chips for Amazon Web Services. Many vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

See all

Wikipedia

Neural processing unit

Some LLM-inferenced chips (from SambaNova, D-Matrix, Etched, and Taalas) may be targeted to consumers, having different trade-offs between speed (tokens/sec) with programmability.

See all

Knowledge Base

Talk Channels

Special Pages

Neural processing unit

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Neural processing unit

Wikipedia

Neural processing unit

History

Neural processing unit

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Neural processing unit

Wikipedia

Neural processing unit