NVLink
NVLink
Main page
537080

NVLink

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
NVLink

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices can use mesh networking to communicate instead of a central hub/switch. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS).

For small numbers of GPUs, the NVLink lanes on a single device is sufficient for an all-to-all mesh connectivity. To accommodate higher GPU counts, NVLink since 2018 use a packet-switched architecture, where a central switch can serve up to 32 two-lane ports. The NVSwitch for NVLink 4.0 can produce some simple computation of its own (e.g. sum, broadcast) to reduce the need for communication thanks to the "SHARP" accelerator.

NVLink is developed by Nvidia for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connection with data rates of 20, 25 and 50 Gbit/s (v1.0/v2.0/v3.0+ resp.) per differential pair. For NVLink 1.0 and 2.0 eight differential pairs form a "sub-link" and two "sub-links", one for each direction, form a "link". Starting from NVlink 3.0 only four differential pairs form a "sub-link". For NVLink 2.0 and higher the total data rate for a sub-link is 25 GB/s and the total data rate for a link is 50 GB/s. Each V100 GPU supports up to six links. Thus, each GPU is capable of supporting up to 300 GB/s in total bi-directional bandwidth. NVLink products introduced to date focus on the high-performance application space. Announced May 14, 2020, NVLink 3.0 increases the data rate per differential pair from 25 Gbit/s to 50 Gbit/s while halving the number of pairs per NVLink from 8 to 4. With 12 links for an Ampere-based A100 GPU this brings the total bandwidth to 600 GB/s. Hopper has 18 NVLink 4.0 links enabling a total of 900 GB/s bandwidth. Thus NVLink 2.0, 3.0 and 4.0 all have a 50 GB/s per bidirectional link, and have 6, 12 and 18 links correspondingly.

The following table shows a basic metrics comparison based upon standard specifications:

The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options:

Real world performance could be determined by applying different encapsulation taxes as well usage rate. Those come from various sources:[citation needed]

Those physical limitations usually reduce the data rate to between 90 and 95% of the transfer rate.[citation needed] NVLink benchmarks show an achievable transfer rate of about 35.3 Gbit/s[contradictory] (host to device) for a 40 Gbit/s (2 sub-lanes uplink) NVLink connection towards a P100 GPU in a system that is driven by a set of IBM POWER8 CPUs.

For the various versions of plug-in boards (a yet small number of high-end gaming and professional graphics GPU boards with this feature exist) that expose extra connectors for joining them into a NVLink group, a similar number of slightly varying, relatively compact, PCB based interconnection plugs does exist. Typically only boards of the same type will mate together due to their physical and logical design. For some setups two identical plugs need to be applied for achieving the full data rate. As of now the typical plug is U-shaped with a fine grid edge connector on each of the end strokes of the shape facing away from the viewer. The width of the plug determines how far away the plug-in cards need to be seated to the main board of the hosting computer system - a distance for the placement of the card is commonly determined by the matching plug (known available plug widths are 3 to 5 slots and also depend on board type). The interconnect is often referred as Scalable Link Interface (SLI) from 2004 for its structural design and appearance, even if the modern NVLink based design is of a quite different technical nature with different features in its basic levels compared to the former design. Reported real world devices are:

See all
User Avatar
No comments yet.