Hubbry Logo
Nvidia BlueFieldNvidia BlueFieldMain
Open search
Nvidia BlueField
Community hub
Nvidia BlueField
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Nvidia BlueField
Nvidia BlueField
from Wikipedia

Nvidia BlueField is a line of data processing units (DPUs) designed and produced by Nvidia. Initially developed by Mellanox Technologies, the BlueField IP was acquired by Nvidia in March 2019, when Nvidia acquired Mellanox Technologies for US$6.9 billion.[1] The first Nvidia produced BlueField cards, named BlueField-2, were shipped for review shortly after their announcement at VMworld 2019, and were officially launched at GTC 2020.[2] Also launched at GTC 2020 was the Nvidia BlueField-2X, an Nvidia BlueField card with an Ampere generation graphics processing unit (GPU) integrated onto the same card.[2] BlueField-3 and BlueField-4 DPUs were first announced at GTC 2021, with the tentative launch dates for these cards being 2022 and 2024 respectively.[3]

Nvidia BlueField cards are targeted for use in datacenters and high performance computing, where latency and bandwidth are important for efficient computation.[4]

BlueField cards differ from network interface controllers in their offloading of functions that would normally be reserved for the CPU, and the presence of CPU cores (typically ARM or MIPS based) and memory support (typically DDR4, though Bluefield-3's release brought support for more exotic memory types such as HBM and DDR5). BlueField cards also run an operating system completely independent from the host system: this is designed to reduce software overhead, as each DPU can function independently of one another and the head unit.[5] This also means that Bluefield cards are capable of allowing remote management of systems that may not typically support it. Bluefield cards can also configure their PCIe bus to function as a host, rather than a device, which lets Bluefield cards connect over a PCIe bridge to another card, such as a compute accelerator, to provide completely network-based, high bandwidth control of a GPU.[6]

The Bluefield X cards are DPU-GPU hybrid cards with a 100 class Nvidia datacenter GPU integrated on the same PCB as the Bluefield DPU. These cards are intended for high power GPU clusters to allow high bandwidth communication without needing to cross the PCIe bus and create an unnecessary load on the CPU where performance may be better allocated to other types of processing. The increase in total external connectivity available to a system in this configuration allows for datasets to be utilized across multiple nodes when they may be too large for any single system to hold in memory.[citation needed]

Models

[edit]
Model Announcement date Release date Networking port options Bandwidth capacity Cores Core type PCIe generation Memory capacity Memory type GPU accelerator SPECint(2k17-rate)[7] TOPS[7]
BlueField-2 October 5, 2020 Q2 2021 Dual QSFP56 10/25/50/100 Gb

Single QSFP56 200 Gb

200Gbit/s 8 ARM A72 4.0 16/32 GB DDR4 N/A 9 0.7
BlueField-2X Q4 2021 Nvidia A100 60
BlueField-3 April 12, 2021 Q1 2022 Quad/Dual/Single QSFP56 400Gbit/s 16 ARM A78 5.0 64 GB DDR5 N/A 42 1.5
BlueField-3X N/A Nvidia A100 75
BlueField-4 2024 Q4 2025 OSFP112 800Gbit/s 64 ARM Neoverse V2 6.0 128 GB DDR5 N/A TBD TBD

H100 CNX & A100 EGX

[edit]

The H100 CNX and the A100 EGX are NIC/GPU hybrid cards and, while visually similar to a Bluefield-X card, are completely distinct, and do not have the Bluefield system on a chip integration. The cards are instead equipped with a generic ConnectX network interface controller.[8][9]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
BlueField is a family of data processing units (DPUs) originally developed by and now produced by following its acquisition of the company, completed in 2020, serving as an infrastructure-on-a-chip platform that integrates networking controllers, Arm cores, and PCIe switches to offload and accelerate critical workloads including networking, storage, and cybersecurity from host CPUs, thereby releasing up to 30% of host CPU resources. The BlueField lineup has evolved through multiple generations, beginning with the BlueField-1 and BlueField-2 models introduced around 2018–2020, which integrated Arm-based processors with high-speed networking interfaces like ConnectX for enterprise, (HPC), and cloud environments. Subsequent iterations, such as the BlueField-3 launched in , feature 400 Gb/s Ethernet or connectivity, up to 16 A78 cores, and enhanced support for software-defined storage (e.g., NVMe-oF) and security functions, enabling line-rate processing in AI, , and hyperscale data centers. In October 2025, announced the BlueField-4, representing a significant advancement with 800 Gb/s speeds, six times the compute performance of its predecessor, and integration of a Grace CPU core alongside ConnectX-9 networking to power gigascale AI factories and elastic , with general availability expected in 2026. This progression underscores BlueField's role in enabling zero-trust security models, real-time threat detection via tools like DOCA Argus, and optimized data paths for GPU-accelerated workflows such as GPUDirect Storage. BlueField DPUs are deployed across diverse applications, from accelerating AI training in cloud environments through RDMA over Converged Ethernet (RoCE) to enhancing cybersecurity in partnerships with firms like Check Point and CrowdStrike, ultimately significantly reducing host CPU overhead and improving overall data center efficiency.

History and Development

Origins and Early Development

The origins of BlueField trace back to ' strategic acquisition of EZchip Technologies in 2015 for $811 million, which enabled the integration of EZchip's programmable processor intellectual property into Mellanox's networking silicon designs. This move built on EZchip's prior acquisition of Tilera in 2014, incorporating multi-core architectures and interconnect technologies to enhance processing capabilities. In June 2016, Mellanox announced the BlueField family of programmable system-on-chip (SoC) processors, marking the first commercial embodiment of this integrated technology for offload applications. BlueField combined cores with networking accelerators and multi-core SoCs, aiming to offload infrastructure tasks such as networking, storage, and from host CPUs, thereby reducing latency and boosting overall in software-defined environments. Early BlueField specifications featured up to 8 A72 cores, 16 lanes of PCIe Gen3 connectivity, and support for 100 Gb/s Ethernet and protocols, all integrated with ConnectX-5 network adapters to facilitate high-performance data flows. Designed for software-defined infrastructure, the initial products began shipping in 2017 as add-in cards or mezzanine modules, targeting cloud, , and scalable storage deployments.

Acquisition by Nvidia and Evolution

In March 2019, Nvidia announced its acquisition of for $6.9 billion in cash, a deal that was completed in April 2020. This move integrated Mellanox's networking expertise, including the (DPU), into 's data center portfolio, enabling closer synergy between high-performance GPUs, accelerated networking, and infrastructure acceleration. Following the acquisition, expanded the BlueField roadmap, unveiling BlueField-2 in October 2020 at GTC with enhanced Arm-based processing and options for GPU integration to support accelerated workloads. At GTC 2021, the company announced BlueField-3, targeting a 2023 launch, and an early vision for BlueField-4 aimed at 2026, positioning the platform as a cornerstone for evolving architectures. Key milestones included the full launch of BlueField-2 in October 2020, general availability of BlueField-3 in March 2023, and the announcement of BlueField-4 on October 28, 2025, at GTC, with early deployment planned for 2026 as part of the AI platforms. The acquisition facilitated strategic shifts toward AI-centric infrastructure, evolving BlueField from a general-purpose DPU to an AI-optimized processor tailored for gigascale AI training in "AI factories." This progression granted BlueField access to Nvidia's ecosystem for software-defined acceleration and interconnects, fostering tighter GPU-DPU integration to enhance data movement and computational efficiency in large-scale AI deployments.

Architecture and Design

Core Components

The Nvidia BlueField Data Processing Unit (DPU) features a system-on-chip (SoC) design that integrates networking controllers, Arm-based CPU cores, and PCIe switches with specialized accelerators to handle data-intensive workloads efficiently. These cores, such as Cortex-A72 in earlier generations and Cortex-A78 in later ones, provide general-purpose computing capabilities while running standard Linux distributions and open-source tools. The BlueField-4 generation integrates a 64-core NVIDIA Grace CPU for enhanced compute performance. The SoC also incorporates programmable data path accelerators, including the Datapath Accelerator (DPA), which enable customizable packet processing and offloading of network functions. Additionally, hardware offload engines support Remote Direct Memory Access (RDMA), encryption, and compression tasks, reducing latency and freeing host CPUs for higher-level computations. This integration enables software-defined networking, storage, and security acceleration, offloading tasks to release up to 30% of host CPU resources. Networking interfaces in BlueField DPUs are built around high-speed ConnectX adapters, supporting both Ethernet and protocols for scalable data center interconnects. These interfaces facilitate (RoCEv2) for low-latency data transfers and support GPUDirect RDMA to enable efficient GPU communication, minimizing overhead in AI and HPC environments. Memory subsystems feature on-board DDR or configurations, providing up to 32 GB in typical setups to support and caching. Input/output connectivity includes PCIe Gen5 x16 interfaces for host integration and optional attachments for accelerated data movement between DPUs and GPUs. The BlueField-3 DPU exposes two x16 PCIe interfaces with internal PCIe switch architecture. Security is embedded at the hardware level through a root of trust mechanism, ensuring secure boot and firmware integrity across operations. BlueField supports with isolation features that protect sensitive workloads via hardware-enforced memory encryption and secure enclaves. Crypto accelerators handle standards like AES, SHA, and TLS offloads, enabling efficient secure data processing without compromising performance. Power consumption for BlueField DPUs typically ranges from 75 W to 150 W (TDP), balancing performance with energy efficiency in dense deployments. Form factors include PCIe add-in cards in full-height half-length or half-height half-length variants, OCulink modules for compact systems, and integrated SuperNIC configurations for optimized networking appliances.

Key Features and Capabilities

The Nvidia BlueField family of Units (DPUs) excels in offloading tasks from host CPUs, enabling accelerated processing of TCP/IP networking, storage protocols such as NVMe-oF and iSER, and functions including firewalls and intrusion detection systems. This hardware-based acceleration frees CPU cycles for application workloads, reducing overall system overhead and enhancing efficiency in data centers. BlueField DPUs deliver high-performance capabilities with line-rate processing at 400-800 Gb/s across Ethernet or connectivity, achieving microsecond-level latency for real-time operations. Storage performance sees significant gains through for protocols like NVMe/TCP. These metrics support scalable disaggregated , zero-trust models with multi-tenant isolation, and in and edge environments, allowing seamless expansion of hybrid deployments. In AI-focused applications, the BlueField-4 DPU introduces a 6x increase in compute power over its predecessor, optimizing AI telemetry, model serving, and data pipeline efficiency for large-scale and . This enhancement integrates briefly with GPUs in platforms like DGX and HGX to accelerate end-to-end AI workflows. Energy efficiency features, including thermal throttling and power capping, further reduce by minimizing power consumption in high-density setups.

Models and Specifications

BlueField and BlueField-2

The BlueField, introduced in 2017, represented the first generation of units (DPUs) designed to offload and accelerate infrastructure tasks from host CPUs in data centers. It featured 8 cores operating at 800 MHz, providing programmable processing for network and storage functions. The device included 16 GB of DDR4 memory and supported 100 Gb/s Ethernet or connectivity via dual ports, integrated with a ConnectX-5 network controller. Connectivity was handled through PCIe Gen3/4 with up to 16 lanes, enabling basic offloads for (SDN) and storage protocols such as NVMe over Fabrics. Targeted primarily at scale-out servers for and enterprise environments, the original BlueField emphasized efficiency in handling data movement and security basics without advanced acceleration engines. The BlueField-2, released in 2020, built upon the foundational design with significant enhancements in performance and versatility, establishing it as a cornerstone for modern acceleration. It retained 8 cores but boosted clock speeds to up to 2.5 GHz, while expanding memory options to 16 GB or 32 GB of DDR4 with ECC support. Connectivity doubled to 200 Gb/s Ethernet or through single or dual ports using the ConnectX-6 controller, paired with PCIe Gen4 supporting 8 or 16 lanes and an integrated switch for bifurcation into up to 8 downstream ports. Key upgrades included enhanced cryptographic engines for , TLS, AES-XTS (256/512-bit), SHA-256, RSA, and ECC acceleration, alongside support for (DPDK) and Single Root (SR-IOV) to enable efficient multi-tenant environments. Initial compatibility with the DOCA framework allowed developers to create custom applications for networking, storage, and offloads directly on the DPU. BlueField-2 variants catered to diverse deployment needs, including SuperNIC configurations like the MBF2H352A-ConnectX-6 Dx for high-density servers, which optimized for 200 Gb/s throughput in compact form factors. Power consumption ranged from 75 W to 100 W TDP across models, balancing performance with thermal efficiency in rack-scale environments. A notable variant, the BlueField-2X, integrated an Nvidia GPU accelerator for edge AI inferencing, enabling on-DPU processing of machine learning workloads alongside networking tasks. By 2021, BlueField-2 had seen widespread adoption in hyperscale data centers for 5G infrastructure and cloud acceleration, offloading up to 30x more CPU cycles compared to software-only solutions.
FeatureOriginal BlueField (2017)BlueField-2 (2020)
Arm Cores8 Cortex-A72 @ 800 MHz8 Cortex-A72 @ up to 2.5 GHz
Memory16 GB DDR416-32 GB DDR4 (ECC)
Connectivity100 Gb/s Ethernet/ (dual ports)200 Gb/s Ethernet/ (single/dual ports)
PCIe InterfaceGen3/4 (up to 16 lanes)Gen4 (8-16 lanes, with switch bifurcation)
Power TDP~75 W75-100 W
Key FocusSDN and storage offloadEnhanced crypto, DPDK/SR-IOV, DOCA apps, edge AI variant
This evolution in the first two generations laid the groundwork for subsequent advancements in DPU technology.

BlueField-3

The BlueField-3 represents the third generation of the company's (DPU) lineup, achieving general availability in 2023. The E-series variant incorporates 8 Armv8.2+ A78 cores operating at up to 2.0 GHz, while the P-series features 16 cores at up to 2.133 GHz. The E-series includes 16 GB of DDR5 memory with 64-bit ECC, and the P-series has 32 GB. Both support 40 GB eMMC and 128 GB SSD for boot and storage. The platform delivers 400 Gb/s connectivity for Ethernet or NDR through QSFP112 ports, enabling high-throughput operations. Key interfaces include PCIe Gen5 x16 for host connectivity and an 4.0 bridge option to facilitate direct GPU integration, such as in H100 systems. Available variants encompass the B3120 and B3220 SuperNICs (E-series), along with P-series models like the B3140H, all designed in a single-slot PCIe form factor with a 125 W TDP. Compared to its predecessor, the BlueField-2—which was limited to PCIe Gen4 and —the BlueField-3 offers significant upgrades, including up to 4x faster cryptographic acceleration for offloads. It also supports future configurations with 800G OSFP connectivity via validated breakout cables and modules, while providing enhanced offloads tailored for efficient AI data ingestion in environments.

BlueField-4

announced the BlueField-4 (DPU) on October 28, 2025, during its GTC event in , positioning it as a key component for powering the operating systems of AI factories. Designed for early availability in 2026 as part of the AI platforms, the BlueField-4 aims to enable gigascale AI infrastructure by offloading networking, , and storage tasks from host CPUs. It builds on the BlueField-3 by integrating advanced processing capabilities for larger-scale deployments. The BlueField-4 features a 64-core NVIDIA Grace CPU based on the V2 architecture, 128 GB LPDDR5 memory, and 512 GB SSD, enabling hybrid processing that combines with efficient data handling. It supports 800 Gb/s networking throughput via NVIDIA ConnectX-9 SuperNICs, doubling the bandwidth of previous generations to facilitate rapid data movement in exascale AI clusters. This configuration delivers six times the compute performance compared to the BlueField-3, with optimizations for 5.0 interconnects to enhance inter-node communication in massive AI systems. Key innovations include support for AI factory-scale telemetry to monitor and manage vast workloads in real time, alongside secure multi-tenancy features that provide zero-trust isolation for multi-tenant environments. The DPU is available in form factors such as a PCIe Gen6 card or OCulink module. As a precursor to the BlueField-5 expected in , the BlueField-4 emphasizes scalable infrastructure for gigascale AI, focusing on accelerated networking and to support the next wave of industrial AI revolutions.

Software and Ecosystem

DOCA Framework

The NVIDIA DOCA (Data Center-on-a-Chip Architecture) Framework, launched in October 2020, serves as the primary (SDK) for the BlueField networking platform, offering APIs and tools to accelerate applications in networking, storage, and security services. It enables developers to leverage the programmable capabilities of BlueField units (DPUs) by providing a unified runtime and that integrates with industry-standard protocols. At its core, DOCA includes specialized libraries for high-performance operations such as (RDMA), GPUDirect for direct GPU-DPU communication, and for real-time monitoring and data collection. The framework also features a runtime environment that supports container orchestration and integration, facilitating seamless deployment of across BlueField-enabled infrastructures. Development within DOCA is streamlined through its SDK, which exposes C/C++ APIs alongside Python bindings for broader accessibility, and includes support for orchestration and Docker containerization directly on the Arm-based cores of BlueField DPUs. The latest version, 3.1.0 released in July 2025 with updates through September 2025, incorporates enhancements for AI cloud environments and maintains (LTS) branches for stability. Key features of DOCA emphasize the creation of custom DPU applications for offloading compute-intensive tasks from host CPUs, such as the DOCA Argus microservice, which delivers real-time threat detection and for AI runtimes; this was integrated with Trend Micro's Vision One platform in October 2025 to enable in AI factories. DOCA is compatible with BlueField-2 and subsequent models, often deployed via the BlueField Software Bundle (BF-Bundle), which provides a complete Linux-based operating system installation including the full SDK and drivers.

Operating Systems and Tools

The BlueField (DPU) supports Arm64-based distributions, with Server 22.04 serving as the default operating system in its reference implementation. Other compatible distributions include and , which can be installed similarly to standard Arm64 servers using driver disks, ConnectX , and GRUB for booting. 's BlueField OS (BFOS) acts as a reference distribution, constructed via the Yocto-based (BSP) and incorporating the DOCA framework for foundational software acceleration. The boot process for BlueField DPUs relies on UEFI firmware, enabling secure and flexible initialization of the Arm cores. During boot, Linux fsck performs filesystem integrity checks to ensure reliable startup, particularly when using persistent storage options like eMMC or SSD. The system supports NVMe-oF for networked storage access, alongside local eMMC and SSD booting, with EFI entries configurable to prioritize SSD devices post-installation. Firmware versions 4.0 and later introduce power-capping to limit energy draw and thermal-throttling to prevent overheating, enhancing reliability in dense data center environments. Key management tools for BlueField include the transitioned MLNX-OFED drivers, now part of the DOCA-OFED stack, which provide essential networking and storage offloads for the environment. The mstflint utility handles updates and queries, allowing administrators to verify and upgrade components like the ConnectX adapter. Additionally, the DOCA runtime facilitates provisioning and orchestration of services across data centers, supporting containerized deployments on BlueField hardware. The BSP version 4.12.0, released in 2025, extends support for high-speed port configurations, enabling network ports to operate in Ethernet-only or InfiniBand-only modes for optimized in diverse infrastructures. This version also includes tools for (VF) configuration and Single Root I/O Virtualization (SR-IOV), allowing per-ECPF and per-PF control over VF allocation to enhance resource partitioning. Security features in BlueField integrate a hardware root of trust anchored in unmodifiable ROM code, which initiates the secure boot chain by authenticating the initial firmware using an off-chip public key verified against on-chip E-FUSE hashes. This process extends a cryptographic chain-of-trust to subsequent boot elements, halting execution if any verification fails to prevent unauthorized code. Device attestation is supported through SPDM protocols via the DPU BMC and , enabling remote verification of firmware integrity and identity.

Applications and Integrations

Data Center and Networking Use Cases

NVIDIA BlueField DPUs accelerate networking functions in data centers by offloading (SDN), load balancing, and (RoCE) from host CPUs, enabling hyperscale cloud environments to achieve higher throughput and lower latency. In , particularly 5G core networks, this offloading accelerates user-plane functions (UPF), doubling packet throughput and reducing data path latency by 40% while freeing CPU resources for revenue-generating applications. For storage optimization, BlueField provides hardware acceleration for NVMe over Fabrics (NVMe-oF) and distributed file systems like Ceph, supporting disaggregated storage architectures where compute and storage resources are independently scaled. This enables all-flash arrays and scale-out storage to deliver significantly higher operations per second (IOPS); for instance, BlueField-2 achieved a of 41.5 million , more than four times the prior benchmark, enhancing performance in cloud-native environments. Security services are enhanced through BlueField's support for inline encryption, distributed denial-of-service () mitigation, and zero-trust isolation, all processed at line rate to protect data flows without impacting performance. In edge computing scenarios, such as IoT gateways, these features enable real-time threat detection and secure data aggregation for operational technology (OT) and cyber-physical systems. At the edge, low-power BlueField variants facilitate deployments in 5G radio access networks (RAN) and infrastructures, where they handle high-bandwidth data processing for applications like precision manufacturing and urban automation. For example, in 2024, integrated BlueField-3 with its data platform to optimize storage clusters for high-performance AI workloads, providing seamless networking and acceleration in distributed edge setups. In 2025, integrations such as Aviz Networks' subscriber-aware load balancing for using BlueField-3 and Arrcus' scalable networking for enterprise AI further demonstrate ongoing advancements in telecom and AI efficiency. Overall, BlueField deployments yield efficiency gains in multi-tenant data centers through DPU-based isolation, reducing (TCO) by streamlining operations and lowering power consumption—up to 29% in server workloads via offloaded functions. These benefits extend to AI extensions by offloading tasks, allowing GPUs to focus on compute-intensive and .

AI and HPC System Integrations

NVIDIA's H100 CNX, introduced in 2022, integrates the H100 Tensor Core GPU with networking capabilities, while the DGX H100 system incorporates two BlueField-3 DPUs to offload infrastructure tasks, enabling accelerated AI training through a fourth-generation interconnect providing 900 GB/s bidirectional bandwidth per GPU. This setup supports multinode AI training in DGX H100 systems, where eight H100 GPUs deliver a combined 32 petaFLOPS of FP8 performance, facilitating large-scale model training with enhanced and . The BlueField-3 DPUs handle data movement and , freeing GPU resources for compute-intensive workloads. In 2021, the A100 EGX platform combined the A100 GPU with BlueField-2 DPU on a single PCIe card, targeting edge AI servers for real-time processing. This integration supports AI inference and multi-node training via connectivity, allowing direct GPU-to-network data paths and Multi-Instance GPU partitioning for up to seven isolated instances. By offloading networking and storage from the host CPU, BlueField-2 enhances latency-sensitive edge applications, such as telco RAN processing. BlueField DPUs play a key role in broader (HPC) environments, including exascale supercomputers, where they offload data movement to optimize scientific simulations and AI workloads. In DGX systems, BlueField-3 offloads infrastructure equivalent to 640 billion transistors from eight H100 GPUs, improving overall system efficiency by handling networking, storage, and security tasks independently. Looking ahead, BlueField-4 is slated for integration into NVIDIA platforms starting in 2026, serving as a core component for AI factories by managing through DOCA and enabling secure scaling in gigascale clusters. It supports multi-tenant isolation and zero-trust security, allowing efficient resource orchestration across thousands of GPUs for generative AI and scientific computing. These integrations yield significant benefits, including up to 1.5x GPU-to-GPU bandwidth in 256-GPU configurations via and NVSwitch, which BlueField complements through high-speed offloads. Additionally, BlueField enables in AI pipelines by providing hardware-accelerated for sensitive data and models, ensuring integrity during and on Hopper and subsequent architectures.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.