Hubbry Logo
Data processing unitData processing unitMain
Open search
Data processing unit
Community hub
Data processing unit
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Data processing unit
Data processing unit
from Wikipedia
SolidRun's SolidNet OCP-8K SmartNIC

A data processing unit (DPU) is a programmable computer processor that tightly integrates a general-purpose CPU with network interface hardware.[1] Sometimes they are called "IPUs" (for "infrastructure processing unit") or "SmartNICs".[2] They can be used in place of traditional NICs to relieve the main CPU of complex networking responsibilities and other "infrastructural" duties; although their features vary, they may be used to perform encryption/decryption, serve as a firewall, handle TCP/IP, process HTTP requests, or even function as a hypervisor or storage controller.[1][3] These devices can be attractive to cloud computing providers whose servers might otherwise spend a significant amount of CPU time on these tasks, cutting into the cycles they can provide to guests.[1]

Large-scale AI data centers are an emerging use case for DPUs. In these environments, massive amounts of data must be moved rapidly among CPUs, GPUs, and storage systems to handle complex AI workloads. By offloading tasks such as packet processing, encryption, and traffic management, DPUs help reduce latency and improve energy efficiency, enabling these data centers to maintain the high throughput and scalability needed for advanced machine learning operations.[4]

Alongside their role in accelerating network and storage functions, some vendors and cloud providers describe DPUs as a "third pillar of computing," reflecting their growing role in modern data-center architectures. Unlike traditional processors, a DPU typically resides on a network interface card, allowing data to be processed at the network’s line rate before it reaches the CPU. This approach offloads critical but lower-level system duties—such as security, load balancing, and data routing—from the central processor, thus freeing CPUs and GPUs to focus on application logic and AI-specific computations.[5]

Terminology

[edit]

The term DPU was first coined by Nvidia after announcing the release of the BlueField-2 DPU,[6] the successor to the Mellanox BlueField SmartNIC following their acquisition[7] by Nvidia.

Examples of DPUs

[edit]

Azure Boost DPU

[edit]

In 2024, Microsoft introduced the Azure Boost DPU, a custom-designed data processing unit aimed at optimizing network and infrastructure efficiency across its Azure cloud platform. This DPU offloads network-related tasks such as packet processing, security enforcement, and traffic management from central CPUs, enabling better performance for application workloads.[8][9]

Key Features

[edit]
  • Network Optimization: The Azure Boost DPU enhances network throughput and reduces latency by processing data packets and offloading these tasks from traditional CPUs.[10]
  • Security Capabilities: It integrates advanced isolation techniques to secure multi-tenant environments, protecting sensitive workloads.[9]
  • Hyperscale Adaptability: Designed for large-scale data centers, the DPU supports Azure’s hyperscale infrastructure, ensuring scalability for modern cloud applications.[8]

Industry Context

[edit]

The Azure Boost DPU aligns with the trend of custom silicon development in hyperscale cloud environments. Similar to AWS’s Nitro System and NVIDIA’s BlueField DPUs, Microsoft’s DPU focuses on enhancing cloud efficiency while addressing rising energy and security demands.[9]

Impact on Cloud Computing

[edit]

The introduction of DPUs reflects a broader shift in the cloud computing industry toward offloading specific functions from general-purpose processors to specialized hardware.[8][9]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A data processing unit (DPU) is a specialized, programmable processor designed to offload and accelerate data-centric workloads, including networking, security, and storage tasks, from central processing units (CPUs) and graphics processing units (GPUs) in modern computing environments such as data centers and cloud infrastructures. DPUs typically integrate a multi-core CPU—often based on architecture—with a high-speed (NIC), onboard memory, and hardware accelerators to handle high-throughput, low-latency directly at the network edge. This architecture enables efficient packet parsing, (RDMA), encryption, and without burdening host processors, thereby improving overall system performance and resource utilization. Emerging over the past decade alongside the growth of hyperscale data centers, DPUs represent a third pillar of alongside CPUs for general-purpose tasks and GPUs for accelerated , with early implementations focusing on smart NICs to address bottlenecks in host-centric networking. Notable examples include NVIDIA's BlueField series, which support frameworks like DOCA for developing applications in AI, cybersecurity, and . In practice, DPUs enhance and energy efficiency by providing hardware-based isolation and reducing power consumption through task offloading, making them essential for applications in (HPC), , , and workloads. They also facilitate secure, programmable data flows in cloud-native environments, supporting features like bridging and software-defined storage.

Definition and Historical Development

Core Definition

A data processing unit (DPU) is a programmable system-on-chip (SoC) that integrates a general-purpose central processing unit (CPU), often based on ARM architecture, with specialized hardware accelerators tailored for network interfaces and data handling. This design enables the DPU to efficiently offload data-centric tasks from host server CPUs, such as packet processing and data movement, thereby optimizing resource utilization in data centers. The primary role of a DPU is to manage networking, storage, and workloads independently, allowing server CPUs to focus on application-level and improving overall system performance. By handling these I/O-intensive operations at line rate, DPUs reduce latency and enhance in and enterprise environments. Key attributes of DPUs include high programmability through standard programming models like , integration of dedicated accelerators for tasks such as and traffic steering, and an emphasis on energy-efficient operations. Unlike general-purpose processors, which excel in compute-bound tasks, DPUs are specifically optimized for I/O-bound operations involving transfer and protocol . Evolving from earlier smart network interface cards (SmartNICs), DPUs represent a more versatile platform for infrastructure acceleration.

Origins and Evolution

The roots of data processing units (DPUs) trace back to the early 2010s, when network interface cards (NICs) began incorporating offload technologies to alleviate CPU burdens in data centers. Technologies such as TCP/IP acceleration via TCP Offload Engines (TOE) in 10G Ethernet NICs emerged to handle protocol processing, reducing latency and CPU utilization for high-speed networking. Similarly, storage protocols like saw hardware offloads in NICs, with converged NICs supporting HBA functions by 2010 to enable efficient block storage over Ethernet without heavy host processing. The NVMe-over-Fabrics (NVMe-oF) specification, released in June 2016, further advanced this by extending NVMe's low-latency access over networks like RDMA and TCP, prompting NIC vendors to integrate acceleration for remote storage workloads. DPUs emerged as a distinct category between 2018 and 2020, propelled by the explosive growth of data in environments, where traditional CPUs struggled with the scale of networking, storage, and security tasks. This period marked a shift toward dedicated processors for data-centric offloading, as hyperscale data centers required more efficient amid rising demands from and software-defined infrastructure. Mellanox announced the BlueField SoC and SmartNICs in January 2018, introducing programmable ARM-based offload capabilities that laid groundwork for broader DPU . , following its 2020 acquisition of Mellanox, formally coined the term "DPU" and unveiled the BlueField-2 family in October 2020, positioning it as a new pillar of alongside CPUs and GPUs. Hyperscalers accelerated , with AWS deploying its Nitro system—functionally akin to a DPU—for instance offloading starting in 2017, and integrating DPUs via acquisitions like Fungible in 2023 to enhance Azure infrastructure. Several factors drove this evolution, including the rise of disaggregated that separated compute, storage, and networking for greater flexibility; the proliferation of networks demanding ultra-low latency data handling; and the surge in AI workloads requiring optimized data movement. Traditional server architectures, with CPUs increasingly bottlenecked by I/O operations, could no longer scale efficiently for these demands, necessitating specialized units. Over time, DPUs evolved from single-function offload cards, such as early or accelerators, to fully programmable platforms with multi-core processors capable of running full operating systems like , enabling custom applications for security, orchestration, and acceleration directly on the DPU. Subsequent developments include Microsoft's introduction of the Azure Boost DPU in 2024 and NVIDIA's announcement of the BlueField-4 in October 2025, further advancing DPU capabilities for AI and cloud workloads.

Technical Architecture

Hardware Components

A Data Processing Unit (DPU) typically features a central processing element consisting of multi-core CPUs based on architecture, such as or cores, operating at clock speeds of 2.5 to 3.0 GHz, with core counts ranging from 8 to 64 for handling general-purpose tasks like operations and exception management. High-end examples as of 2025, such as , incorporate 64 V2 cores integrated with Grace CPU technology for enhanced AI workloads. Specialized accelerators form a core part of DPU hardware, integrating and programmable engines for domain-specific functions; these include networking components for packet parsing, (RDMA) support, and , as well as storage controllers like NVMe-oF interfaces and security engines for TLS/SSL encryption, compression, and matching. Advanced models further embed AI/ML accelerators, such as inline engines for tasks, providing up to 100x performance gains over software-based processing in edge and environments. Memory subsystems in DPUs utilize high-bandwidth options like 16 GB or more of DDR4 or DDR5, often shared between CPU cores and accelerators to support efficient data handling and features like connection tracking. Interconnects emphasize low-latency connectivity, with PCIe Gen4 or Gen5 interfaces for host integration, alongside high-speed network ports supporting Ethernet or InfiniBand at speeds up to 800 Gbps as of 2025, and SerDes lanes up to 112 Gbps PAM4 for flexible deployment. DPUs are commonly deployed in PCIe card form factors, such as half-height or full-height add-in cards, or as integrated SoCs within servers and networking appliances, with power consumption typically ranging from 15 W to 75 W, though higher-end models may draw up to 150 W with connectors to support intensive operations.

Integrated Software and Programmability

Data processing units (DPUs) integrate a comprehensive software stack that enables flexible deployment and management of data-centric workloads, distinct from traditional CPU-centric systems. This stack typically includes lightweight operating systems optimized for low-overhead execution, allowing DPUs to operate in bare-metal configurations or support containerized environments for efficient resource isolation. For instance, NVIDIA's BlueField DPUs ship with 22.04 as the default OS on their Arm-based execution environment, facilitating seamless integration with host systems while minimizing resource consumption. Similarly, the (DPDK) provides user-space libraries for high-performance packet processing on DPUs, compatible with distributions like , enabling run-to-completion models that bypass kernel overhead for bare-metal-like performance in containerized setups. Marvell's OCTEON DPUs further extend this with a unified SDK incorporating DPDK poll and event mode drivers, supporting kernel hooks for efficient OS-level operations. Programmability in DPUs is enhanced through domain-specific languages and APIs that allow developers to customize data plane behaviors without hardware redesign. The P4 language is widely used for programmable packet processing, enabling DPUs to define flexible forwarding rules and integrate with infrastructure like NVIDIA BlueField, where P4 leverages the DPU's data path accelerators for reconfigurable networking. eBPF extends this capability for kernel-level extensions, permitting safe, dynamic program loading on DPUs to offload tasks such as network function virtualization; for example, Marvell OCTEON 10 DPUs support eBPF offloading for Cilium CNI, achieving transparent acceleration of eBPF-based stacks. Developer access is streamlined via APIs like NVIDIA's DOCA framework, which provides libraries for software-defined services on BlueField DPUs, including telemetry and security offloads. Intel's oneAPI offers a unified model for heterogeneous programming across IPUs and related accelerators, supporting data analytics libraries for optimized workloads. Virtualization features in DPUs ensure compatibility with multi-tenant environments by integrating with technologies. Single Root I/O Virtualization (SR-IOV) allows a DPU to appear as multiple virtual functions to the host, enabling direct assignment to virtual machines for low-latency I/O; BlueField implements asymmetric SR-IOV for per-function control over VF allocation. VFIO complements this by facilitating PCI passthrough, securing device access in virtualized setups without hypervisor mediation. management incorporates secure boot mechanisms to verify boot components, as seen in BlueField's Secure Boot, which halts execution on verification failure to prevent tampering. Over-the-air updates are supported through alternate boot partitions for safe firmware upgrades, while orchestration tools integrate with via 's DOCA Platform Framework (DPF) for provisioning and scaling DPU resources in clusters. Customization capabilities allow DPUs to deploy tailored data flow pipelines, optimizing for specific workloads and reducing latency in real-time applications. DOCA Flow enables programmable packet with custom matching and action rules, directing flows to accelerators or services dynamically to minimize delays. This extensibility supports workload-specific behaviors, such as integrating with DPDK for user-defined pipelines on Marvell DPUs, enhancing adaptability without CPU intervention.

Key Functionalities

Data Networking Offload

Data processing units (DPUs) offload critical networking tasks from host CPUs, enabling efficient data movement in data centers by handling protocol processing directly in hardware. Key functions include TCP/UDP processing through stateless offloads and connection tracking, and VPN termination for secure communications, and load balancing via hierarchical (QoS) mechanisms. Additionally, DPUs support overlay networks such as VXLAN and Geneve, which encapsulate traffic for virtualized environments without burdening the host processor. DPUs achieve high performance in networking offload, supporting line-rate processing at speeds up to 400 Gbps for Ethernet and connections, while maintaining microsecond-level latencies through hardware timestamping, with upcoming models like BlueField-4 supporting up to 800 Gbps as announced in 2025. In practical deployments, such as with and Geneve offloads, DPUs enable 100 Gbps throughput without CPU bottlenecks, reducing host CPU utilization by up to 70% or even 3x in containerized environments like . These gains stem from offloading encapsulation and encryption tasks, allowing CPUs to focus on application logic rather than I/O overhead. Supported protocols extend to high-performance interconnects like (RoCE) with zero-touch configurations and at NDR speeds, facilitating low-overhead data transfers for distributed systems. In disaggregated architectures, DPUs enable and remote memory access via RDMA without host intervention, delivering performance comparable to local access with minimal CPU involvement. Energy efficiency is enhanced by hardware-accelerated packet and forwarding, which minimize power consumption per bit processed. Offloading networking tasks to DPUs can reduce overall server power draw by up to 34%, or 247 watts per server, particularly under high-utilization workloads where CPU savings compound. This hardware-centric approach ensures scalable, low-power operations in dense environments.

Storage and Security Acceleration

Data Processing Units (DPUs) accelerate storage operations by offloading tasks such as NVMe over Fabrics (NVMe-oF) processing, erasure coding, and deduplication from host CPUs, enabling efficient handling of distributed file systems like Ceph and Lustre. NVMe-oF support allows direct remote access to NVMe storage devices over networks, reducing the involvement of server resources in data transfers. Erasure coding implements RAID-like redundancy in software-defined storage, distributing data across nodes while minimizing reconstruction overhead. Deduplication identifies and eliminates redundant data blocks, optimizing storage capacity in large-scale environments. Security acceleration in DPUs encompasses inline and decryption using AES-GCM algorithms, alongside firewalling, , and zero-trust enforcement at the network edge. AES-GCM provides for data in transit or at rest, offloading cryptographic computations to dedicated hardware engines. Firewall capabilities include distributed next-generation firewalls with connection tracking, filtering malicious traffic at line rate. detects and blocks volumetric attacks through hardware-accelerated packet inspection, preventing resource exhaustion. Zero-trust models are enforced via micro-segmentation and functional isolation, ensuring workloads remain segregated even in compromised environments. DPUs integrate with storage protocols such as for block-level access over TCP/IP and hardware root-of-trust mechanisms for secure . offload enables remote booting and efficient data transfer without host CPU intervention. While primary focus is on Ethernet and , some DPU architectures accommodate via extensions like FCoE for legacy SAN compatibility. Hardware root-of-trust initiates a secure chain, validating and establishing isolated execution environments for sensitive computations. These accelerations yield performance gains, including reduced storage latency through optimized NVMe-oF paths and encryption handling at wire speed without consuming host CPU cycles. NVMe-oF offload lowers end-to-end latency by streamlining data paths, achieving sub-millisecond access in disaggregated setups compared to traditional CPU-mediated transfers. Wire-speed encryption ensures cryptographic operations match network throughput, such as 400 Gb/s, maintaining full line-rate performance for secure data flows. Data integrity is maintained via end-to-end checksumming and secure features embedded in DPU hardware. Checksumming verifies data consistency across storage protocols, detecting during transfers or at rest using ECC-protected and protocol-level validation. Secure leverages public key accelerators for RSA, ECC, and Diffie-Hellman operations, alongside true random number generators for , ensuring robust protection in confidential environments.

Commercial Examples

NVIDIA BlueField

NVIDIA BlueField represents a prominent series of data processing units (DPUs) developed by to offload and accelerate critical data center infrastructure tasks, including networking, storage, and , from host CPUs. The platform integrates high-performance ARM-based processors with advanced networking interfaces, enabling programmable acceleration for software-defined environments. BlueField DPUs are engineered for seamless integration with 's GPU ecosystem, facilitating efficient data movement and processing in AI-driven infrastructures. The product lineup began with the BlueField-2 DPU, introduced in 2019, which features 8 cores and supports up to 200 Gbps Ethernet or HDR InfiniBand connectivity, providing foundational acceleration for software-defined storage, networking, and services. This model was designed to free up to 125 CPU cores per DPU by offloading common functions, marking an early advancement in infrastructure composability. Building on this, the BlueField-3 DPU, launched in 2022, doubles the power with up to 16 Armv8.2+ A78 cores, 16 GB of DDR5 memory, and 400 Gbps Ethernet or NDR InfiniBand support, while incorporating dedicated AI accelerators for enhanced in-network computing. These specifications enable line-rate for tasks like NVMe-oF storage and IPsec encryption, with PCIe Gen5 interfaces for high-bandwidth host connectivity. In October 2025, NVIDIA introduced the BlueField-4 DPU, offering 800 Gb/s networking speeds and 6x the compute power of its predecessor, with integrated accelerations for networking, storage, cybersecurity, and support for gigascale AI factories through secure multi-tenant environments and real-time threat detection. A key enabler of BlueField's capabilities is the DOCA , which provides a unified SDK for developing and deploying applications on the DPU, including GPU-DPU integration for optimized data pipelines and support for standards like DPDK for high-performance packet processing. This framework allows developers to create custom services for networking, , and storage, ensuring across BlueField generations. The platform targets AI data pipelines and (HPC) environments, with direct integration into 's DGX systems and SuperPOD architectures to accelerate GPU-to-GPU communications and workload orchestration in AI factories. BlueField DPUs have seen widespread adoption in hyperscale data centers by 2025, powering for major cloud providers and AI deployments, with implementations contributing to power efficiency gains of up to 30% through offloaded networking and tasks. A notable is the integrated BlueField-3 SuperNIC, the first DPU-embedded network accelerator optimized for hyperscale AI, delivering 400 Gbps (RoCE) with secure multi-tenancy and deterministic GPU offload to reduce latency in large-scale training. This feature enhances scalability in AI and HPC clusters by isolating while maintaining high throughput.

Microsoft Azure Boost DPU

The Azure Boost DPU represents Microsoft's first in-house data processing unit, announced on November 19, 2024, at Microsoft Ignite, as part of its custom silicon initiatives to enhance cloud infrastructure efficiency. This hardware-software co-design is optimized for data-centric workloads, featuring a lightweight data-flow operating system that integrates high-speed Ethernet and PCIe interfaces, along with dedicated network and storage engines, data accelerators, and cryptography engines for security. Built on custom silicon, it incorporates ARM cores paired with specialized accelerators for tasks like encryption and packet processing, enabling seamless offloading of compute-intensive operations from host CPUs. A core aspect of the Azure Boost DPU is its support for confidential computing, providing hardware-isolated enclaves that separate control and data planes for virtual machines, thereby protecting sensitive data in use through trusted execution environments. This isolation aligns with Azure's zero-trust security model, facilitating secure multi-tenant cloud operations for services such as Azure SQL Database and Kubernetes-based workloads. The DPU integrates with Azure's broader ecosystem, including tools like Azure Arc for hybrid and multi-cloud management, allowing consistent governance and deployment across on-premises and cloud environments. In the context of multi-tenant cloud services, the Azure Boost DPU addresses key challenges in and by offloading networking, storage, and tasks directly at the layer, reducing latency and enhancing resource utilization. For instance, it supports zero-trust principles by embedding accelerations that verify and protect flows without host intervention, making it particularly suited for high-stakes applications in , healthcare, and AI-driven within Azure. Performance-wise, the DPU delivers significant efficiency gains, running workloads at up to 4x the performance while consuming 3x less power than traditional CPU-based . This offload capability allows Azure data centers to achieve greater VM density, supporting denser deployments of instances without compromising security or throughput.

AWS Nitro and Others

The AWS Nitro System, introduced in 2017 alongside the C5 EC2 instance type, represents a foundational implementation of data processing offload using custom silicon to enhance virtualization and performance in Amazon EC2 environments. By 2020, its architecture had evolved to encompass broader DPU-like capabilities, including dedicated hardware for networking, storage, and security tasks, thereby reducing the host CPU's involvement in infrastructure operations. Key components include the Elastic Network Adapter (ENA) for high-performance networking and Nitro Enclaves, which provide isolated compute environments for sensitive workloads through hardware-rooted security. This system integrates seamlessly with AWS Graviton processors, enabling Arm-based instances to leverage offloaded I/O while maintaining up to 100 Gbps throughput per network interface. Beyond AWS, other DPU offerings emphasize integration within hyperscaler and enterprise ecosystems, focusing on specialized offloads for storage, policy enforcement, and . Fungible's DPU, launched in August 2020, targets storage-intensive applications with its F1 chip, which incorporates on-chip processing for tasks like compression, , and to optimize efficiency. Pensando, acquired by in 2022, introduced its DPU platform around 2021, featuring programmable ASICs paired with cores to enable policy-based traffic processing, , and functions in cloud and AI workloads. Similarly, Intel's Infrastructure Processing Unit (IPU) roadmap, unveiled in May 2022 and including models like the E2100 series launched in 2024, is designed for enterprise edge deployments, offloading infrastructure tasks to support scalable networking up to 200 Gbps. These solutions share a common emphasis on reducing host overhead in large-scale environments, often through composable architectures that align with cloud-native programmability models.

Applications and Industry Impact

Role in Cloud and Data Centers

In cloud and environments, units (DPUs) are deployed through flexible models to enhance efficiency and resource utilization. These include standalone PCIe cards that attach to servers for dedicated offload processing, integration directly into network interface cards (NICs) to combine connectivity with , and configuration within composable frameworks across server racks, enabling dynamic pooling and allocation of compute, storage, and networking resources. DPUs support critical use cases that drive modern cloud operations, such as facilitating by handling provisioning and scaling tasks independently of host CPUs, forming virtualized storage pools for software-defined storage in hyperconverged setups, and enabling efficient edge-to-cloud data flows to process and route information across distributed networks. Integration with orchestration platforms allows DPUs to bolster (NFV) and (SDN), streamlining deployments in ecosystems like and . In environments, DPUs enable off-path processing with mechanisms such as OVN for virtual networking, while in setups, they accelerate NSX-based SDN and security services directly on the DPU hardware. DPUs provide the scalability needed for hyperscale operations, managing petabyte-scale data movement in large cloud providers including Google Cloud and , where custom DPU designs optimize high-volume traffic handling. As of 2025, the DPU market is valued at $11.87 billion, with projections to reach $21.89 billion by 2033 (CAGR 10.74%), driven by applications in and IoT that require processing massive data volumes across hyperscale and edge facilities.

Performance Benefits and Challenges

Data Processing Units (DPUs) offer substantial performance advantages by offloading infrastructure tasks such as networking, storage, and security from host CPUs, enabling more efficient resource utilization in data centers. In practical deployments, this offloading can reduce CPU utilization by up to 70% without compromising network throughput, as demonstrated in tests using NVIDIA BlueField-2 DPUs for virtual network functions. For AI workloads, DPUs accelerate data ingestion and access, optimizing throughput to GPUs and reducing bottlenecks in training pipelines, with integrations like NVIDIA BlueField-3 enabling direct data transfers that enhance overall efficiency. Power consumption also benefits significantly, with DPU offloads achieving up to 34% savings—equivalent to 247 watts per server in high-load scenarios like IPsec encryption—translating to millions in operational cost reductions for large-scale environments. From a cost perspective, DPUs facilitate hardware consolidation by freeing CPU cycles, leading to reduced (TCO) through fewer servers and lower energy demands; for instance, deploying DPUs across 10,000 nodes can yield a 15% TCO reduction over three years, including $13.1 million in power savings and $6.6 million in cooling. (ROI) is typically realized within 6-12 months for large deployments, particularly in where integration streamlines operations and cuts deployment times. Initial hardware costs for models like the BlueField-2 were around $1,500 as of ; current prices for new units start at approximately $2,000. Despite these gains, DPUs present challenges in programming and due to their diverse architectures, which often require vendor-specific software development kits (SDKs) and languages like P4, complicating code portability and increasing development time. issues arise from a lack of standardized frameworks across vendors, leading to potential lock-in and inconsistent performance when integrating components from different providers. Case studies highlight benefits in storage protocols, where NVMe-oF setups can achieve lower latency compared to CPU-based alternatives, improving in disaggregated environments. Looking ahead, DPUs are evolving toward AI-enhanced designs that incorporate accelerators for adaptive workload management, with projections indicating widespread adoption in AI factories by 2030 to handle escalating data demands. Recent DPUs like the BlueField-4, released in October 2025, integrate quantum-secure gateways via partnerships such as with Qrypt, ensuring robust protection for data in transit without compromising performance.

With CPUs and GPUs

Data processing units (DPUs) complement central processing units (CPUs) by offloading (I/O) intensive tasks such as networking and storage operations from general-purpose CPUs like x86 or architectures, enabling the CPUs to concentrate on application execution and processing. CPUs handle general tasks as sequential, general-purpose processors with fewer cores optimized for complex, single-threaded operations and branch-heavy workloads. In contrast, DPUs are designed for parallel data flow management, incorporating specialized accelerators for tasks like packet processing and . This division allows DPUs to process data streams at line rates, such as 100 Gigabit per second, without burdening CPU resources. In contrast to graphics processing units (GPUs), which are tailored for high-throughput floating-point computations in areas like graphics rendering, , and tensor operations through massive parallelism with thousands of smaller cores, DPUs focus on orchestrating data pipelines to supply clean, optimized data to GPUs, thereby alleviating bottlenecks in data movement. GPUs accelerate parallel workloads such as AI training and video rendering, but lack the integrated high-speed network interfaces and storage controllers inherent in DPUs, which typically feature fewer than 100 processing cores combined with hardware accelerators rather than the expansive parallel arrays found in GPUs. As a result, DPUs do not compete directly with GPUs but enhance their efficiency by managing ingress and egress of large datasets. DPUs manage data infrastructure tasks like networking and security, offloading these from CPUs and GPUs to improve overall system performance. The concept of XPU represents a broader framework for heterogeneous computing, incorporating various specialized accelerators such as CPUs, GPUs, and DPUs under a unified software model to enable optimal performance across diverse workloads. Modern systems often combine CPUs, GPUs, DPUs, and other XPUs for enhanced efficiency in applications including artificial intelligence, cloud computing, and high-performance computing. Synergies between DPUs, CPUs, and GPUs are evident in integrated systems, such as NVIDIA's BlueField DPUs, where they preprocess and ingest data for GPU-based training in deep neural networks, reducing CPU involvement and improving overall throughput by up to 17.5% in distributed training scenarios. In these setups, DPUs handle data-centric tasks like formatting and checks before forwarding directly to GPUs like GPUDirect, bypassing traditional CPU . This collaborative architecture maps workloads distinctly: CPUs manage the for and decision-making, GPUs accelerate compute-intensive operations, and DPUs oversee the data plane for efficient transfer and processing. Regarding efficiency, DPUs demonstrate superior performance in storage operations, achieving up to 14.8 times better compared to CPU-based systems, with improvements in operations per second () per CPU socket ranging from 3 to 5 times. This stems from their specialized hardware offloads, which minimize power draw for I/O tasks that would otherwise consume significant CPU cycles and energy.

With Smart NICs and Infrastructure Processing Units

Data processing units (DPUs) represent an advancement over smart network interface cards (smart NICs), which primarily handle basic networking offloads such as TCP offload engines (TOEs) and (RDMA) to reduce host CPU involvement in packet processing. For example, Mellanox's ConnectX series smart NICs, now under , focus on accelerating Ethernet and connectivity with hardware-specific functions like offloads and support, but lack extensive general-purpose programmability. In contrast, DPUs extend this foundation by incorporating programmable Arm-based cores and software ecosystems that enable running full applications, including containers for tasks like security and storage management directly on the device. This allows DPUs to support multi-workload orchestration, such as deploying pods or virtual network functions, offloading them from the host server to improve efficiency in data centers. Compared to Intel's Infrastructure Processing Units (IPUs), DPUs offer a broader scope of capabilities beyond pure infrastructure tasks. Intel IPUs, such as the E2100 series, emphasize offloading networking primitives like virtual switches (vSwitches) and load balancing to free host CPU resources for applications, using an ASIC architecture with 16 N1 cores for telco and environments; earlier IPUs often combined FPGA accelerators with processors. While IPUs excel in infrastructure-specific acceleration, such as and basic inspections, DPUs integrate additional domains like NVMe storage protocols and , supported by higher core counts—typically 16 cores in NVIDIA's BlueField-3. This results in DPUs delivering greater compute capacity for diverse workloads compared to traditional smart NICs. The evolution from smart NICs to DPUs reflects a post-2020 maturation in programmable I/O hardware, with many vendors transitioning their offerings to include DPU features for greater flexibility. For instance, NVIDIA's BlueField series built upon the ConnectX smart NIC lineage by adding multi-core processors and SDKs for custom software, enabling independent operation as infrastructure endpoints. Intel's IPUs have seen growing adoption in data centers and environments. As of 2025, IPU revenue is reported to have doubled from 2024 levels. As of , DPUs have captured a significant portion of the advanced I/O market, driven by demand for disaggregated in environments, as they supplant legacy NICs in high-performance deployments. This shift underscores DPUs' role in enabling scalable, secure at the network edge.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.