Hubbry Logo
PCI ExpressPCI ExpressMain
Open search
PCI Express
Community hub
PCI Express
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
PCI Express
PCI Express
from Wikipedia

PCI Express
Peripheral Component Interconnect Express
Year created2003; 22 years ago (2003)
Created by
Supersedes
Width in bits1 per lane, up to 16 lanes[1]
No. of devices1 on each endpoint of each connection[a]
SpeedDual simplex, up to 242 GB/s
StyleSerial
Hotplugging interfaceOptional (supported with ExpressCard, OCuLink, CFexpress or U.2)
External interfaceOptional (supported with OCuLink or other forms of PCI Express External Cabling, tunneled over USB4 and Thunderbolt)
Websitepcisig.com
A PCIe 3.0 ×8 host bus adapter
Various slots on a computer motherboard, from top to bottom:
  • PCI Express ×4
  • PCI Express ×16
  • PCI Express ×1
  • PCI Express ×16
  • Conventional PCI (32-bit, 5 V)

PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe,[2] is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as PCI, PCI-X and AGP. Developed and maintained by the PCI-SIG (PCI Special Interest Group), PCIe is commonly used to connect graphics cards, sound cards, Wi-Fi and Ethernet adapters, and storage devices such as solid-state drives and hard disk drives.[3]

Compared to earlier standards, PCIe supports faster data transfer, uses fewer pins, takes up less space, and allows devices to be added or removed while the computer is running (hot swapping). It also includes better error detection and supports newer features like I/O virtualization for advanced computing needs.[4]

PCIe connections are made through "lanes," which are pairs of conductors that send and receive data. Devices can use one or more lanes depending on how much data they need to transfer.[5] PCIe technology is also used in laptop expansion cards (like ExpressCard) and in storage connectors such as M.2, U.2, and SATA Express.

Architecture

[edit]
Example of the PCI Express topology:
white "junction boxes" represent PCI Express device downstream ports. The gray ones represent upstream ports.[6]: 7 
PCI Express ×1 card containing a PCI Express switch (covered by a small heat sink), which creates multiple endpoints out of one endpoint and lets multiple devices share it
The PCIe slots on a motherboard are often labeled with the number of PCIe lanes they have. Sometimes what may seem like a large slot may only have a few lanes. For instance, a ×16 slot with only 4 PCIe lanes (bottom slot) is quite common.[7]

Conceptually, the PCI Express bus is a high-speed serial replacement of the older PCI/PCI-X bus.[8] One of the key differences between the PCI Express bus and the older PCI is the bus topology; PCI uses a shared parallel bus architecture, in which the PCI host and all devices share a common set of address, data, and control lines. In contrast, PCI Express is based on point-to-point topology, with separate serial links connecting every device to the root complex (host). Because of its shared bus topology, access to the older PCI bus is arbitrated (in the case of multiple masters), and limited to one master at a time, in a single direction. Furthermore, the older PCI clocking scheme limits the bus clock to the slowest peripheral on the bus (regardless of the devices involved in the bus transaction). In contrast, a PCI Express bus link supports full-duplex communication between any two endpoints, with no inherent limitation on concurrent access across multiple endpoints.

In terms of bus protocol, PCI Express communication is encapsulated in packets. The work of packetizing and de-packetizing data and status-message traffic is handled by the transaction layer of the PCI Express port (described later). Radical differences in electrical signaling and bus protocol require the use of a different mechanical form factor and expansion connectors (and thus, new motherboards and new adapter boards); PCI slots and PCI Express slots are not interchangeable. At the software level, PCI Express preserves backward compatibility with PCI; legacy PCI system software can detect and configure newer PCI Express devices without explicit support for the PCI Express standard, though new PCI Express features are inaccessible.

The PCI Express link between two devices can vary in size from one to 16 lanes. In a multi-lane link, the packet data is striped across lanes, and peak data throughput scales with the overall link width. The lane count is automatically negotiated during device initialization and can be restricted by either endpoint. For example, a single-lane PCI Express (×1) card can be inserted into a multi-lane slot (×4, ×8, etc.), and the initialization cycle auto-negotiates the highest mutually supported lane count. The link can dynamically down-configure itself to use fewer lanes, providing a failure tolerance in case bad or unreliable lanes are present. The PCI Express standard defines link widths of ×1, ×2, ×4, ×8, and ×16. Up to and including PCIe 5.0, ×12, and x32 links were defined as well but virtually[clarification needed] never used.[9] This allows the PCI Express bus to serve both cost-sensitive applications where high throughput is not needed, and performance-critical applications such as 3D graphics, networking (10 Gigabit Ethernet or multiport Gigabit Ethernet), and enterprise storage (SAS or Fibre Channel). Slots and connectors are only defined for a subset of these widths, with link widths in between using the next larger physical slot size.

As a point of reference, a PCI-X (133 MHz 64-bit) device and a PCI Express 1.0 device using four lanes (×4) have roughly the same peak single-direction transfer rate of 1064 MB/s. The PCI Express bus has the potential to perform better than the PCI-X bus in cases where multiple devices are transferring data simultaneously, or if communication with the PCI Express peripheral is bidirectional.

Interconnect

[edit]
A PCI Express link between two devices consists of one or more lanes, which are dual simplex channels using two differential signaling pairs.[6]: 3 

PCI Express devices communicate via a logical connection called an interconnect[10] or link. A link is a point-to-point communication channel between two PCI Express ports allowing both of them to send and receive ordinary PCI requests (configuration, I/O or memory read/write) and interrupts (INTx, MSI or MSI-X). At the physical level, a link is composed of one or more lanes.[10] Low-speed peripherals (such as an 802.11 Wi-Fi card) use a single-lane (×1) link, while a graphics adapter typically uses a much wider and therefore faster 16-lane (×16) link.

Lane

[edit]

A lane is composed of two differential signaling pairs, with one pair for receiving data and the other for transmitting. Thus, each lane is composed of four wires or signal traces. Conceptually, each lane is used as a full-duplex byte stream, transporting data packets in eight-bit "byte" format simultaneously in both directions between endpoints of a link.[11] Physical PCI Express links may contain 1, 4, 8 or 16 lanes.[12][6]: 4, 5 [10] Lane counts are written with an "x" prefix (for example, "×8" represents an eight-lane card or slot), with ×16 being the largest size in common use.[13] Lane sizes are also referred to via the terms "width" or "by" e.g., an eight-lane slot could be referred to as a "by 8" or as "8 lanes wide."

For mechanical card sizes, see below.

Serial bus

[edit]

The bonded serial bus architecture was chosen over the traditional parallel bus because of the inherent limitations of the latter, including half-duplex operation, excess signal count, and inherently lower bandwidth due to timing skew. Timing skew results from separate electrical signals within a parallel interface traveling through conductors of different lengths, on potentially different printed circuit board (PCB) layers, and at possibly different signal velocities. Despite being transmitted simultaneously as a single word, signals on a parallel interface have different travel duration and arrive at their destinations at different times. When the interface clock period is shorter than the largest time difference between signal arrivals, recovery of the transmitted word is no longer possible. Since timing skew over a parallel bus can amount to a few nanoseconds, the resulting bandwidth limitation is in the range of hundreds of megahertz.

Highly simplified topologies of the Legacy PCI Shared (Parallel) Interface and the PCIe Serial Point-to-Point Interface[14]

A serial interface does not exhibit timing skew because there is only one differential signal in each direction within each lane, and there is no external clock signal since clocking information is embedded within the serial signal itself. As such, typical bandwidth limitations on serial signals are in the multi-gigahertz range. PCI Express is one example of the general trend toward replacing parallel buses with serial interconnects; other examples include Serial ATA (SATA), USB, Serial Attached SCSI (SAS), FireWire (IEEE 1394), and RapidIO. In digital video, examples in common use are DVI, HDMI, and DisplayPort, but they were replacements for analog VGA, not for a parallel bus.

Multichannel serial design increases flexibility with its ability to allocate fewer lanes for slower devices.

Form factors

[edit]

PCI Express add-in card

[edit]
Intel P3608 NVMe flash SSD, PCIe add-in card

A PCI Express add-in card fits into a slot of its physical size or larger (with ×16 as the largest used), but may not fit into a smaller PCI Express slot; for example, a ×16 card may not fit into a ×4 or ×8 slot. Some slots use open-ended sockets to permit physically longer cards and negotiate the best available electrical and logical connection.

The number of lanes actually connected to a slot may also be fewer than the number supported by the physical slot size. An example is a ×16 slot that runs at ×4, which accepts any ×1, ×2, ×4, ×8 or ×16 card, but provides only four lanes. Its specification may read as "×16 (×4 mode)" or "×16 (×4 signal)", while "mechanical @ electrical" notation (e.g. "×16 @ ×4") is also common.[citation needed] The advantage is that such slots can accommodate a larger range of PCI Express cards without requiring motherboard hardware to support the full transfer rate. Standard mechanical sizes are ×1, ×4, ×8, and ×16. Cards using a number of lanes other than the standard mechanical sizes need to physically fit the next larger mechanical size (e.g. an ×2 card uses the ×4 size, or an ×12 card uses the ×16 size).

The cards themselves are designed and manufactured in various sizes. For example, solid-state drives (SSDs) that come in the form of PCI Express cards often use HHHL (half height, half length) and FHHL (full height, half length) to describe the physical dimensions of the card. The concept of "full" and "half" heights and lengths are inherited from Conventional PCI.[15][16]

PCI express add-in card dimensions[17]
PCI card type Dimensions Notes
(mm) (in)
Height Standard (Full) 111.15 4.376 Fits 3U chassis.
Low profile (Half) 68.90 2.731 Fits 2U chassis.
Length Full 312.00 4.376 × 12.283 Enough space for ×16.
Three-quarter 254.00 4.376 × 10.00 Enough space for ×16.
Half 167.65 4.376 × 6.60 Enough space for ×16.
Width Single slot 18.71 0.737 Fits 1U chassis if rotated.
Dual slot 39.04 1.537 Fits 1U chassis if rotated.
Triple slot 59.36 2.337 Fits 2U chassis if rotated.

The length levels beside full are not a PCIe standard, but only a manufacturer agreement. Half length provides sufficient space for a ×16 connector. Below that narrower data connectors need to be used.

These dimensions can be freely mixed and matched, but larger dimensions tend to co-occur.

There is a fixed distance of 57.15 millimetres (2.250 in) between the connector's key notch (middle ridge diving data and power) and the end of the card, which may be covered by an end plate with a screw-hole for installing onto the computer case. This fixed length ensures that cards do not protrude out of the chassis.

The slot spacing is exactly 0.8 inches (20 mm) on ATX motherboards.

For further specifications of the slot, see #Physical layer below.

Non-standard video card form factors

[edit]

Modern (since c. 2012[18]) gaming video cards usually exceed the height as well as thickness specified in the PCI Express standard, due to the need for more capable and quieter cooling fans, as gaming video cards often emit hundreds of watts of heat.[19] Modern computer cases are often wider to accommodate these taller cards, but not always. Since full-length cards (312 mm) are uncommon, modern cases sometimes cannot accommodate them. The thickness of these cards also typically occupies the space of 2 to 5[20] PCIe slots. In fact, even the methodology of how to measure the cards varies between vendors, with some including the metal bracket size in dimensions and others not.

For instance, comparing three high-end video cards released in 2020: a Sapphire Radeon RX 5700 XT card measures 135 mm in height (excluding the metal bracket), which exceeds the PCIe standard height by 28 mm,[21] another Radeon RX 5700 XT card by XFX measures 55 mm thick (i.e. 2.7 PCI slots at 20.32 mm), taking up 3 PCIe slots,[22] while an Asus GeForce RTX 3080 video card takes up two slots and measures 140.1 mm × 318.5 mm × 57.8 mm, exceeding PCI Express's maximum height, length, and thickness respectively.[23]

Pinout

[edit]

The following table identifies the conductors on each side of the edge connector on a PCI Express card. The solder side of the printed circuit board (PCB) is the A-side, and the component side is the B-side.[24] PRSNT1# and PRSNT2# pins must be slightly shorter than the rest, to ensure that a hot-plugged card is fully inserted. The WAKE# pin uses full voltage to wake the computer, but must be pulled high from the standby power to indicate that the card is wake capable.[25]

Power

[edit]
The main 12 V power supply for the PCIe slot is pins B2, B3 (side B) and pins A2, A3 (side A). Power standby 3.3 V is pin B10 and A10. PCIe ×1 cards can draw up to 25 W and ×16 graphics cards can draw up to 75 W, combined.[29]
Slot power
[edit]

All PCI express cards may consume up to A at +3.3 V (9.9 W). The amount of +12 V and total power they may consume depends on the form factor and the role of the card:[30]: 35–36 [31][32]

  • ×1 cards are limited to 0.5 A at +12 V (6 W) and 10 W combined.
  • ×4 and wider cards are limited to 2.1 A at +12 V (25 W) and 25 W combined.
  • A full-sized ×1 card may draw up to the 25 W limits after initialization and software configuration as a high-power device.
  • A full-sized ×16 graphics card may draw up to 5.5 A at +12 V (66 W) and 75 W combined after initialization and software configuration as a high-power device.[25]: 38–39 
6- and 8-pin power connectors
[edit]
8-pin (left) and 6-pin (right) power connectors used on PCI Express cards

Optional connectors add 75 W (6-pin) or 150 W (8-pin) of +12 V power for up to 300 W total (2 @ 75 W + 1 @ 150 W).

  • Sense0 pin is connected to ground by the cable or power supply, or float on board if cable is not connected.
  • Sense1 pin is connected to ground by the cable or power supply, or float on board if cable is not connected.

Some cards use two 8-pin connectors, allowing 375 W total (1 @ 75 W + 2 @ 150 W). This was newly standardized in PCI Express 4.0 CEM of 2018, though it was already in use before then.[17] The 8-pin PCI Express connector should not be confused with the EPS12V connector, which is mainly used for powering SMP and multi-core systems. The power connectors are variants of the Molex Mini-Fit Jr. series connectors.[33]

Molex Mini-Fit Jr. part numbers[33]
Pins Female/receptacle
on PS cable
Male/right-angle
header on PCB
6-pin 45559-0002 45558-0003
8-pin 45587-0004 45586-0005, 45586-0006
6-pin power connector (75 W)[34] 8-pin power connector (150 W)[35][36][37]
6 pin power connector pin map

  8 pin power connector pin map
Pin Description Pin Description
1 +12 V 1 +12 V
2 Not connected (usually +12 V as well) 2 +12 V
3 +12 V 3 +12 V
4 Sense 1 (8-pin connected[A])
4 Ground 5 Ground
5 Sense 6 Sense 0 (6-pin or 8-pin connected)
6 Ground 7 Ground
8 Ground
  1. ^ When a 6-pin connector is plugged into an 8-pin receptacle the card is notified by a missing Sense1 that it may only use up to 75 W.
12VHPWR connector
[edit]
16-pin 12VHPWR connector
12VHPWR and 12v-2x6 connector pinout

The 16-pin 12VHPWR connector is a standard for connecting graphics processing units (GPUs) to computer power supplies for up to 600 W power delivery. It was introduced by Nvidia in 2022 to supersede the previous 6- and 8-pin power connectors for GPUs. The stated aim was to cater to the increasing power requirements of Nvidia GPUs. The connector was formally adopted as part of PCI Express 5.[38]

The connector was replaced by a minor revision called 12V-2x6, introduced in 2023 with PCIe CEM 5.1 and PCIe ECN 6.0,[39][40] which changed the GPU- and PSU-side sockets to ensure that the sense pins only make contact if the power pins are seated properly. The cables and their plugs remained unchanged.[41] The change is intended to prevent melting due to partial contact, but melting continued to be reported for GPUs with this new socket.[42] There is a significant change in power negotiation with a new sense pin added.[43]

12VHPWR connectors are marked with an "H+" symbol whereas 12V-2x6 connectors are marked with an "H++" symbol.[44]
48VHPWR connector
[edit]

In 2021, PCIe Card Electromechanical (CEM, pronounced like “chem” in “chemistry”) Specification introduced a connector for 48 Volts with two current-carrying contacts and four sense pins. It was retained into PCIe-CEM 5.1 of 2023.[45] The contacts are rated for 15 Amps continuous current. The 48VHPWR connector can carry 720 watts.

48VHPWR pinout, Plug side
P1 P2
+48V (15 A) Ground (15 A)
S1 S2 S3 S4
CARD_PWR_STABLE CARD_CBL_PRES# SENSE0 SENSE1

Later[when?] it was removed and an incompatible 48V 1×2 connector was introduced where Sense0 and Sense1 are located farthest from each other.

Power excursion
[edit]

Power excursion refers to short peaks of power draw exceeding the rated maximum (sustained) power level. Since an add-on Engineering Change Notice (ECN) to PCIe-CEM 5.0, the additional power connectors need to be able to handle 100-microsecond power draw at 3× of maximum sustained power, reducing to 1× at the 1-second window level following a logarithmic line. Since PCIe-ECM 5.1, slot power has a similar excursion expansion at 2.5× over 100 μs. In CEM 5.1, the added excursion limit is only provided after software configuration, specifically the Set_​Slot_​Power_​Limit message. The ECN is part of ATX 3.0 and PCIe CEM 5.1 is part of ATX 3.1.[46]

PCI Express Mini Card

[edit]
A WLAN PCI Express Mini Card and its connector
MiniPCI and MiniPCI Express cards in comparison

PCI Express Mini Card (also known as Mini PCI Express, Mini PCIe, Mini PCI-E, mPCIe, and PEM), based on PCI Express, is a replacement for the Mini PCI form factor. It is developed by the PCI-SIG. The host device supports both PCI Express and USB 2.0 connectivity, and each card may use either standard. Most laptop computers built after 2005 use PCI Express for expansion cards; however, as of 2015, many vendors are moving toward using the newer M.2 form factor for this purpose.[47]

Due to different dimensions, PCI Express Mini Cards are not physically compatible with standard full-size PCI Express slots; however, passive adapters exist that let them be used in full-size slots.[48]

Physical dimensions

[edit]

Dimensions of PCI Express Mini Cards are 30 mm × 50.95 mm (width × length) for a Full Mini Card. There is a 52-pin edge connector, consisting of two staggered rows on a 0.8 mm pitch. Each row has eight contacts, a gap equivalent to four contacts, then a further 18 contacts. Boards have a thickness of 1.0 mm, excluding the components. A "Half Mini Card" (sometimes abbreviated as HMC) is also specified, having approximately half the physical length of 26.8 mm. There are also half size mini PCIe cards that are 30 x 31.90 mm which is about half the length of a full size mini PCIe card.[49][50]

Electrical interface

[edit]

PCI Express Mini Card edge connectors provide multiple connections and buses:

  • PCI Express ×1 (with SMBus)
  • USB 2.0
  • Wires to diagnostics LEDs for wireless network (i.e., Wi-Fi) status on computer's chassis
  • SIM card for GSM and WCDMA applications (UIM signals on spec.)
  • Future extension for another PCIe lane
  • 1.5 V and 3.3 V power

Mini-SATA (mSATA) variant

[edit]
An Intel mSATA SSD

Despite sharing the Mini PCI Express form factor, an mSATA slot is not necessarily electrically compatible with Mini PCI Express. For this reason, only certain notebooks are compatible with mSATA drives. Most compatible systems are based on Intel's Sandy Bridge processor architecture, using the Huron River platform. Notebooks such as Lenovo's ThinkPad T, W and X series, released in March–April 2011, have support for an mSATA SSD card in their WWAN card slot. The ThinkPad Edge E220s/E420s, and the Lenovo IdeaPad Y460/Y560/Y570/Y580 also support mSATA.[51] On the contrary, the L-series among others can only support M.2 cards using the PCIe standard in the WWAN slot.

Some notebooks (notably the Asus Eee PC, the Apple MacBook Air, and the Dell mini9 and mini10) use a variant of the PCI Express Mini Card as an SSD. This variant uses the reserved and several non-reserved pins to implement SATA and IDE interface passthrough, keeping only USB, ground lines, and sometimes the core PCIe ×1 bus intact.[52] This makes the "miniPCIe" flash and solid-state drives sold for netbooks largely incompatible with true PCI Express Mini implementations.

Also, the typical Asus miniPCIe SSD is 71 mm long, causing the Dell 51 mm model to often be (incorrectly) referred to as half length. A true 51 mm Mini PCIe SSD was announced in 2009, with two stacked PCB layers that allow for higher storage capacity. The announced design preserves the PCIe interface, making it compatible with the standard mini PCIe slot. No working product has yet been developed.

Intel has numerous desktop boards with the PCIe ×1 Mini-Card slot that typically do not support mSATA SSD. A list of desktop boards that natively support mSATA in the PCIe ×1 Mini-Card slot (typically multiplexed with a SATA port) is provided on the Intel Support site.[53]

PCI Express M.2

[edit]

M.2 replaces the mSATA standard and Mini PCIe.[54] Computer bus interfaces provided through the M.2 connector are PCI Express 3.0 or higher (up to four lanes), Serial ATA 3.0, and USB 3.0 (a single logical port for each of the latter two). It is up to the manufacturer of the M.2 host or device to choose which interfaces to support, depending on the desired level of host support and device type.

PCI Express External Cabling

[edit]

PCI Express External Cabling (also known as External PCI Express, Cabled PCI Express, or ePCIe) specifications were released by the PCI-SIG in February 2007.[55][56]

Standard cables and connectors have been defined for ×1, ×4, ×8, and ×16 link widths, with a transfer rate of 250 MB/s per lane. The PCI-SIG also expects the norm to evolve to reach 500 MB/s, as in PCI Express 2.0. An example of the uses of Cabled PCI Express is a metal enclosure, containing a number of PCIe slots and PCIe-to-ePCIe adapter circuitry. This device would not be possible had it not been for the ePCIe specification.

[edit]

OCuLink (standing for "optical-copper link", as Cu is the chemical symbol for copper) is an extension for the "cable version of PCI Express". Version 1.0 of OCuLink, released in Oct 2015, supports up to 4 PCIe 3.0 lanes (3.9 GB/s) over copper cabling; a fiber optic version may appear in the future.

The most recent version of OCuLink, OCuLink-2, supports 8 GB/s or 16 GB/s (PCIe 4.0 ×4 or ×8)[57] while the maximum bandwidth of a USB4 v2.0 or Thunderbolt 5 connection is 10 GB/s.

OCulink is principally intended for PCIe (or SATA breakout) interconnections in servers, but also finds limited adoption on laptops for the connection of external GPU boxes.[58]

Derivative forms

[edit]

Numerous other form factors use, or are able to use, PCIe. These include:

  • Low-height card
  • ExpressCard: Successor to the PC Card form factor (with ×1 PCIe and USB 2.0; hot-pluggable)
  • PCI Express ExpressModule: A hot-pluggable modular form factor defined for servers and workstations
  • XQD card: A PCI Express-based flash card standard by the CompactFlash Association with ×2 PCIe
  • CFexpress card: A PCI Express-based flash card by the CompactFlash Association in three form factors supporting 1 to 4 PCIe lanes
  • SD card: The SD Express bus, introduced in version 7.0 of the SD specification uses a ×1 PCIe link
  • XMC: Similar to the CMC/PMC form factor (VITA 42.3)
  • AdvancedTCA: A complement to CompactPCI for larger applications; supports serial based backplane topologies
  • AMC: A complement to the AdvancedTCA specification; supports processor and I/O modules on ATCA boards (×1, ×2, ×4 or ×8 PCIe).
  • FeaturePak: A tiny expansion card format (43 mm × 65 mm) for embedded and small-form-factor applications, which implements two ×1 PCIe links on a high-density connector along with USB, I2C, and up to 100 points of I/O
  • Universal IO: A variant from Super Micro Computer Inc designed for use in low-profile rack-mounted chassis.[59] It has the connector bracket reversed so it cannot fit in a normal PCI Express socket, but it is pin-compatible and may be inserted if the bracket is removed.
  • M.2 (formerly known as NGFF)
  • M-PCIe brings PCIe 3.0 to mobile devices (such as tablets and smartphones), over the M-PHY physical layer.[60][61]
  • Serial Attached SCSI-related ports:
    • SATA Express, U.2 (formerly known as SFF-8639), U.3 use the same port
    • SlimSAS (SFF-8654)
    • SFF-TA-1016 (M-XIO connector)
    • SFF-TA-1026, SFF-TA-1033

The PCIe slot connector can also carry protocols other than PCIe. Some 9xx series Intel chipsets support Serial Digital Video Out, a proprietary technology that uses a slot to transmit video signals from the host CPU's integrated graphics instead of PCIe, using a supported add-in.

The PCIe transaction-layer protocol can also be used over some other interconnects, which are not electrically PCIe:

  • Thunderbolt: A royalty-free (as of Thunderbolt 3) interconnect standard by Intel that combines DisplayPort and PCIe protocols in a form factor compatible with Mini DisplayPort. Thunderbolt 3.0 also combines USB 3.1 and uses the USB-C form factor as opposed to Mini DisplayPort.
    • USB4 is an extension of Thunderbolt 3.0. Thunderbolt 4 and Thunderbolt 5 are profiles of USB4 specifying higher levels of mandatory features.

History and revisions

[edit]

While in early development, PCIe was initially referred to as HSI (for High Speed Interconnect), and underwent a name change to 3GIO (for 3rd Generation I/O) before finally settling on its PCI-SIG name PCI Express. A technical working group named the Arapaho Work Group (AWG) drew up the standard. For initial drafts, the AWG consisted only of Intel engineers; subsequently, the AWG expanded to include industry partners.

Since, PCIe has undergone several large and smaller revisions, improving on performance and other features.

Comparison table

[edit]
PCI Express link performance[62][63][64]
Version Year Line code Transfer rate
(per lane)[i][ii]
Throughput (GB/s)[i][iii]
×1 ×2 ×4 ×8 ×16
1.0 2003 NRZ 8b/10b 2.5 GT/s 0.25 0.5 1 2 4
2.0 2007 5.0 GT/s 0.5 1 2 4 8
3.0 2010 128b/130b 8.0 GT/s 0.985 1.969 3.938 7.877 15.754
4.0 2017 16.0 GT/s 1.969 3.938 7.877 15.754 31.508
5.0 2019 32.0 GT/s 3.938 7.877 15.754 31.508 63.015
6.0 2022 PAM-4
FEC
1b/1b
242B/256B FLIT
64.0 GT/s 7.563 15.125 30.25 60.5 121
7.0 2025 128.0 GT/s 15.125 30.25 60.5 121 242
8.0 2028
(planned)
256.0 GT/s 30.25 60.5 121 242 484
Notes
  1. ^ a b In each direction (each lane is a dual simplex channel).
  2. ^ Transfer rate refers to the encoded serial bit rate; 2.5 GT/s means 2.5 Gbit/s serial data rate.
  3. ^ Throughput indicates the usable bandwidth (i.e. only including the payload, not the 8b/10b, 128b/130b, or 242B/256B encoding overhead). The PCIe 1.0 transfer rate of 2.5 GT/s per lane means a 2.5 Gbit/s serial bit rate; after applying a 8b/10b encoding, this corresponds to a useful throughput of 2.0 Gbit/s = 250 MB/s.

PCI Express 1.0a

[edit]

In 2003, PCI-SIG introduced PCIe 1.0a, with a per-lane data rate of 0.25 gigabytes per second (GB/s) and a transfer rate of 2.5 gigatransfers per second (GT/s).

Transfer rate is expressed in transfers per second instead of bits per second because the number of transfers includes the overhead bits, which do not provide additional throughput;[65] PCIe 1.x uses an 8b/10b encoding scheme, resulting in a 20% (= 2/10) overhead on the raw channel bandwidth.[66] So in the PCIe terminology, transfer rate refers to the encoded bit rate: 2.5 GT/s is 2.5 Gbit/s on the encoded serial link. This corresponds to 2.0 Gbit/s of pre-coded data or 0.25 GB/s, which is referred to as throughput in PCIe.

PCI Express 1.1

[edit]

In 2005, PCI-SIG[67] introduced PCIe 1.1. This updated specification includes clarifications and several improvements, but is fully compatible with PCI Express 1.0a. No changes were made to the data rate.

PCI Express 2.0

[edit]
A PCI Express 2.0 ×1 expansion card that provides USB 3.0 connectivity[b]

PCI-SIG announced the availability of the PCI Express Base 2.0 specification on 15 January 2007.[68] The PCIe 2.0 standard doubles the transfer rate compared with PCIe 1.0 to 5 GT/s and the per-lane throughput rises from 250 MB/s to 500 MB/s. Consequently, a 16-lane PCIe connector (×16) can support an aggregate throughput of up to 8 GB/s.

PCIe 2.0 motherboard slots are fully backward compatible with PCIe v1.x cards. PCIe 2.0 cards are also generally backward compatible with PCIe 1.x motherboards, using the available bandwidth of PCI Express 1.1. Overall, graphic cards or motherboards designed for v2.0 work, with the other being v1.1 or v1.0a.

The PCI-SIG also said that PCIe 2.0 features improvements to the point-to-point data transfer protocol and its software architecture.[69]

Intel's first PCIe 2.0 capable chipset was the X38 and boards began to ship from various vendors (Abit, Asus, Gigabyte) as of 21 October 2007.[70] AMD started supporting PCIe 2.0 with its AMD 700 chipset series and nVidia started with the MCP72.[71] All of Intel's prior chipsets, including the Intel P35 chipset, supported PCIe 1.1 or 1.0a.[72]

Like 1.x, PCIe 2.0 uses an 8b/10b encoding scheme, therefore delivering, per-lane, an effective 4 Gbit/s max. transfer rate from its 5 GT/s raw data rate.

PCI Express 2.1

[edit]

PCI Express 2.1 (with its specification dated 4 March 2009) supports a large proportion of the management, support, and troubleshooting systems planned for full implementation in PCI Express 3.0. However, the speed is the same as PCI Express 2.0. The increase in power from the slot breaks backward compatibility between PCI Express 2.1 cards and some older motherboards with 1.0/1.0a, but most motherboards with PCI Express 1.1 connectors are provided with a BIOS update by their manufacturers through utilities to support backward compatibility of cards with PCIe 2.1.

PCI Express 3.0

[edit]

PCI Express 3.0 Base specification revision 3.0 was made available in November 2010, after multiple delays. In August 2007, PCI-SIG announced that PCI Express 3.0 would carry a bit rate of 8 gigatransfers per second (GT/s), and that it would be backward compatible with existing PCI Express implementations. At that time, it was also announced that the final specification for PCI Express 3.0 would be delayed until Q2 2010.[73] New features for the PCI Express 3.0 specification included a number of optimizations for enhanced signaling and data integrity, including transmitter and receiver equalization, PLL improvements, clock data recovery, and channel enhancements of currently supported topologies.[74]

Following a six-month technical analysis of the feasibility of scaling the PCI Express interconnect bandwidth, PCI-SIG's analysis found that 8 gigatransfers per second could be manufactured in mainstream silicon process technology, and deployed with existing low-cost materials and infrastructure, while maintaining full compatibility (with negligible impact) with the PCI Express protocol stack.

PCI Express 3.0 upgraded the encoding scheme to 128b/130b from the previous 8b/10b encoding, reducing the bandwidth overhead from 20% of PCI Express 2.0 to approximately 1.54% (= 2/130). PCI Express 3.0's 8 GT/s bit rate effectively delivers 985 MB/s per lane, nearly doubling the lane bandwidth relative to PCI Express 2.0.[63]

On 18 November 2010, the PCI-SIG officially published the finalized PCI Express 3.0 specification to its members to build devices based on this new version of PCI Express.[75]

PCI Express 3.1

[edit]

In September 2013, PCI Express 3.1 specification was announced for release in late 2013 or early 2014, consolidating various improvements to the published PCI Express 3.0 specification in three areas: power management, performance and functionality.[61][76] It was released in November 2014.[77]

PCI Express 4.0

[edit]

On 29 November 2011, PCI-SIG preliminarily announced PCI Express 4.0,[78] providing a 16 GT/s bit rate that doubles the bandwidth provided by PCI Express 3.0 to 31.5 GB/s in each direction for a 16-lane configuration, while maintaining backward and forward compatibility in both software support and used mechanical interface.[79] PCI Express 4.0 specs also bring OCuLink-2, an alternative to Thunderbolt. OCuLink version 2 has up to 16 GT/s (16 GB/s total for ×8 lanes),[57] while the maximum bandwidth of a Thunderbolt 3 link is 5 GB/s.

At the 2016 PCI-SIG Developers Conference, Cadence, PLDA, and Synopsys demonstrated their development of PCIe 4.0 physical layer, controller, switch, and other IP blocks.[80]

Mellanox Technologies announced the first 100 Gbit/s network adapter with PCIe 4.0 on 15 June 2016,[81] and the first 200 Gbit/s network adapter with PCIe 4.0 on 10 November 2016.[82]

In August 2016, Synopsys presented a test setup with FPGA clocking a lane to PCIe 4.0 speeds at the Intel Developer Forum. Their IP has been licensed to several firms planning to present their chips and products at the end of 2016.[83]

On the IEEE Hot Chips Symposium in August 2016 IBM announced the first CPU with PCIe 4.0 support, POWER9.[84][85]

PCI-SIG officially announced the release of the final PCI Express 4.0 specification on 8 June 2017.[86] The spec includes improvements in flexibility, scalability, and lower-power.

On 5 December 2017 IBM announced the first system with PCIe 4.0 slots, Power AC922.[87][88]

NETINT Technologies introduced the first NVMe SSD based on PCIe 4.0 on 17 July 2018, ahead of Flash Memory Summit 2018[89]

AMD announced on 9 January 2019 its upcoming Zen 2-based processors and X570 chipset would support PCIe 4.0.[90] AMD had hoped to enable partial support for older chipsets, but instability caused by motherboard traces not conforming to PCIe 4.0 specifications made that impossible.[91][92]

Intel released their first mobile CPUs with PCI Express 4.0 support in mid-2020, as a part of the Tiger Lake microarchitecture.[93]

PCI Express 5.0

[edit]
Three PCIe 5.0 ×16 (first and third slots at ×16, fourth slot at ×8 throughput) and two PCIe 4.0 ×16 slots (second slot at ×4, fifth slot at ×8 throughput) on a 2023 workstation mainboard.

In June 2017, PCI-SIG announced the PCI Express 5.0 preliminary specification.[86] Bandwidth was expected to increase to 32 GT/s, yielding 63 GB/s in each direction in a 16-lane configuration. The draft spec was expected to be standardized in 2019.[citation needed] Initially, 25.0 GT/s was also considered for technical feasibility.

On 7 June 2017 at PCI-SIG DevCon, Synopsys recorded the first demonstration of PCI Express 5.0 at 32 GT/s.[94]

On 31 May 2018, PLDA announced the availability of their XpressRICH5 PCIe 5.0 Controller IP based on draft 0.7 of the PCIe 5.0 specification on the same day.[95][96]

On 10 December 2018, the PCI SIG released version 0.9 of the PCIe 5.0 specification to its members,[97] and on 17 January 2019, PCI SIG announced the version 0.9 had been ratified, with version 1.0 targeted for release in the first quarter of 2019.[98]

On 29 May 2019, PCI-SIG officially announced the release of the final PCI Express 5.0 specification.[99] The PCI Express 5.0 retained backward compatibility with previous versions of PCI Express specifications.

On 20 November 2019, Jiangsu Huacun presented the first PCIe 5.0 Controller HC9001 in a 12 nm manufacturing process[100] and production started in 2020.

On 17 August 2020, IBM announced the Power10 processor with PCIe 5.0 and up to 32 lanes per single-chip module (SCM) and up to 64 lanes per double-chip module (DCM).[101]

On 9 September 2021, IBM announced the Power E1080 Enterprise server with planned availability date 17 September.[102] It can have up to 16 Power10 SCMs with maximum of 32 slots per system which can act as PCIe 5.0 ×8 or PCIe 4.0 ×16.[103] Alternatively they can be used as PCIe 5.0 ×16 slots for optional optical CXP converter adapters connecting to external PCIe expansion drawers.

On 27 October 2021, Intel announced the 12th Gen Intel Core CPU family, the world's first consumer x86-64 processors with PCIe 5.0 (up to 16 lanes) connectivity.[104]

On 22 March 2022, Nvidia announced Nvidia Hopper GH100 GPU, the world's first PCIe 5.0 GPU.[105]

On 23 May 2022, AMD announced its Zen 4 architecture with support for up to 24 lanes of PCIe 5.0 connectivity on consumer platforms and 128 lanes on server platforms.[106][107]

PCI Express 6.0

[edit]

On 18 June 2019, PCI-SIG announced the development of PCI Express 6.0 specification. Bandwidth is expected to increase to 64 GT/s, yielding 128 GB/s in each direction in a 16-lane configuration, with a target release date of 2021.[108] The new standard uses 4-level pulse-amplitude modulation (PAM-4) with a low-latency forward error correction (FEC) in place of non-return-to-zero (NRZ) modulation.[109] Unlike previous PCI Express versions, forward error correction is used to increase data integrity and PAM-4 is used as line code so that two bits are transferred per transfer. With 64 GT/s data transfer rate (raw bit rate), up to 121 GB/s in each direction is possible in ×16 configuration.[108]

On 24 February 2020, the PCI Express 6.0 revision 0.5 specification (a "first draft" with all architectural aspects and requirements defined) was released.[110]

On 5 November 2020, the PCI Express 6.0 revision 0.7 specification (a "complete draft" with electrical specifications validated via test chips) was released.[111]

On 6 October 2021, the PCI Express 6.0 revision 0.9 specification (a "final draft") was released.[112]

On 11 January 2022, PCI-SIG officially announced the release of the final PCI Express 6.0 specification.[113] The PCI Express 6.0 retained backward compatibility with previous versions of PCI Express specifications.

PAM-4 coding results in a vastly higher bit error rate (BER) of 10−6 (vs. 10−12 previously), so in place of 128b/130b encoding, a 3-way interlaced forward error correction (FEC) is used in addition to cyclic redundancy check (CRC). A fixed 256 byte Flow Control Unit (FLIT) block carries 242 bytes of data, which includes variable-sized transaction level packets (TLP) and data link layer payload (DLLP); remaining 14 bytes are reserved for 8-byte CRC and 6-byte FEC.[114][115] 3-way Gray code is used in PAM-4/FLIT mode to reduce error rate; the interface does not switch to NRZ and 128/130b encoding even when retraining to lower data rates.[116][117]

PCIe 6.0 hardware was not launched until August 2025,[118] roughly three years after the release of the final specifications and shortly after the publication of the PCIe 7.0 specifications.[119] The delay was described as unprecedented, with PCWorld noting that that for many years PCIe 6.0 existed "solely on paper".[120]

PCI Express 7.0

[edit]

On 21 June 2022, PCI-SIG announced the development of PCI Express 7.0 specification.[121] It will deliver 128 GT/s raw bit rate and up to 242 GB/s per direction in ×16 configuration, using the same PAM4 signaling as version 6.0. Doubling of the data rate will be achieved by fine-tuning channel parameters to decrease signal losses and improve power efficiency, but signal integrity is expected to be a challenge. The specification is expected to be finalized in 2025.

On 3 April 2024, the PCI Express 7.0 revision 0.5 specification (a "first draft") was released.[122]

On 17 January 2025, PCI-SIG announced the release of PCIe 7.0 specification version 0.7 (a "complete draft").[123]

On 19 March 2025, PCI-SIG announced the release of PCIe 7.0 specification version 0.9 (a "final draft"); planned final release is still in 2025.[124]

The following main points were formulated as objectives of the new standard:

  • Delivering 128 GT/s raw bit rate and up to 512 GB/s bi-directionally via ×16 configuration
  • Utilizing PAM4 (Pulse Amplitude Modulation with 4 levels) signaling
  • Focusing on the channel parameters and reach
  • Improving power efficiency
  • Continuing to deliver the low-latency and high-reliability targets
  • Maintaining backwards compatibility with all previous generations of PCIe technology

On 11 June 2025, PCI-SIG officially announced the release of the final PCI Express 7.0 specification.[125]

At its release, PCI-SIG commented that it did not see the PCIe 7.0 coming to the PC market for some time. Instead the interface is initially targeted at cloud computing, 800-gigabit Ethernet, and artificial intelligence applications.[120]

PCI Express 8.0

[edit]

On 5 August 2025, PCI-SIG announced the development of PCI Express 8.0. The specification is planned for release by year 2028. It will deliver double the speed of the previous version, 256.0 GT/s raw bit rate and up to 1 TB/s bi-directionally via x16 configuration.[126]

Extensions and future directions

[edit]

Some vendors offer PCIe over fiber products,[127][128][129] with active optical cables (AOC) for PCIe switching at increased distance in PCIe expansion drawers,[130][103] or in specific cases where transparent PCIe bridging is preferable to using a more mainstream standard (such as InfiniBand or Ethernet) that may require additional software to support it.

Thunderbolt was co-developed by Intel and Apple as a general-purpose high speed interface combining a logical PCIe link with DisplayPort and was originally intended as an all-fiber interface, but due to early difficulties in creating a consumer-friendly fiber interconnect, nearly all implementations are copper systems. A notable exception, the Sony VAIO Z VPC-Z2, uses a nonstandard USB port with an optical component to connect to an outboard PCIe display adapter. Apple has been the primary driver of Thunderbolt adoption through 2011, though several other vendors[131] have announced new products and systems featuring Thunderbolt. Thunderbolt 3 forms the basis of the USB4 standard.

Mobile PCIe specification (abbreviated to M-PCIe) allows PCI Express architecture to operate over the MIPI Alliance's M-PHY physical layer technology. Building on top of already existing widespread adoption of M-PHY and its low-power design, Mobile PCIe lets mobile devices use PCI Express.[132] iPhone is one example that utilizing integrated NVMe storage with M-PCIe.

Draft process

[edit]

There are 5 primary releases/checkpoints in a PCI-SIG specification:[133]

  • Draft 0.3 (Concept): this release may have few details, but outlines the general approach and goals.
  • Draft 0.5 (First draft): this release has a complete set of architectural requirements and must fully address the goals set out in the 0.3 draft.
  • Draft 0.7 (Complete draft): this release must have a complete set of functional requirements and methods defined, and no new functionality may be added to the specification after this release. Before the release of this draft, electrical specifications must have been validated via test silicon.
  • Draft 0.9 (Final draft): this release allows PCI-SIG member companies to perform an internal review for intellectual property, and no functional changes are permitted after this draft.
  • 1.0 (Final release): this is the final and definitive specification, and any changes or enhancements are through Errata documentation and Engineering Change Notices (ECNs) respectively.

Historically, the earliest adopters of a new PCIe specification generally begin designing with the Draft 0.5 as they can confidently build up their application logic around the new bandwidth definition and often even start developing for any new protocol features. At the Draft 0.5 stage, however, there is still a strong likelihood of changes in the actual PCIe protocol layer implementation, so designers responsible for developing these blocks internally may be more hesitant to begin work than those using interface IP from external sources.

Hardware protocol summary

[edit]

The PCIe link is built around dedicated unidirectional couples of serial (1-bit), point-to-point connections known as lanes. This is in sharp contrast to the earlier PCI connection, which is a bus-based system where all the devices share the same bidirectional, 32-bit or 64-bit parallel bus.

PCI Express is a layered protocol, consisting of a transaction layer, a data link layer, and a physical layer. The Data Link Layer is subdivided to include a media access control (MAC) sublayer. The Physical Layer is subdivided into logical and electrical sublayers. The Physical logical-sublayer contains a physical coding sublayer (PCS). The terms are borrowed from the IEEE 802 networking protocol model.

Physical layer

[edit]
Connector pins and lengths
Lanes Pins[134] Length in mm (in)
Board connector Connector slot
Total Variable Total Variable Total Variable
×1 2×18=36 2×7=14 20.3 (0.8) 7.2 (0.28) 25 (1.0) 7.65 (0.30)
×4 2×32=64 2×21=42 34.3 (1.4) 21.2 (0.8) 39 (1.5) 21.65 (0.85)
×8 2×49=98 2×38=76 51.3 (2.0) 38.2 (1.5) 56 (2.2) 38.65 (1.52)
×16 2×82=164 2×71=142 84.3 (3.3) 71.2 (2.8) 89 (3.5) 71.65 (2.82)
An open-end PCI Express ×1 connector lets longer cards that use more lanes be plugged while operating at ×1 speeds.

The PCIe Physical Layer (PHY, PCIEPHY, PCI Express PHY, or PCIe PHY) specification is divided into two sub-layers, corresponding to electrical and logical specifications. The logical sublayer is sometimes further divided into a MAC sublayer and a PCS, although this division is not formally part of the PCIe specification. A specification published by Intel, the PHY Interface for PCI Express (PIPE),[135] defines the MAC/PCS functional partitioning and the interface between these two sub-layers. The PIPE specification also identifies the physical media attachment (PMA) layer, which includes the SerDes (serializer/deserializer) and other analog circuitry; however, since SerDes implementations vary greatly among ASIC vendors, PIPE does not specify an interface between the PCS and PMA.

At the electrical level, each lane consists of two unidirectional differential pairs operating at 2.5, 5, 8, 16 or 32 Gbit/s, depending on the negotiated capabilities. Transmit and receive are separate differential pairs, for a total of four data wires per lane.

A connection between any two PCIe devices is known as a link, and is built up from a collection of one or more lanes. All devices must minimally support single-lane (×1) link. Devices may optionally support wider links composed of up to 32 lanes.[136][137] This allows for very good compatibility in two ways:

  • A PCIe card physically fits (and works correctly) in any slot that is at least as large as it is (e.g., a ×1 sized card works in any sized slot);
  • A slot of a large physical size (e.g., ×16) can be wired electrically with fewer lanes (e.g., ×1, ×4, ×8, or ×12) as long as it provides the ground connections required by the larger physical slot size.

In both cases, PCIe negotiates the highest mutually supported number of lanes. Many graphics cards, motherboards and BIOS versions are verified to support ×1, ×4, ×8 and ×16 connectivity on the same connection.

The width of a PCIe connector is 8.8 mm, while the height is 11.25 mm, and the length is variable. The fixed section of the connector is 11.65 mm in length and contains two rows of 11 pins each (22 pins total), while the length of the other section is variable depending on the number of lanes. The pins are spaced at 1 mm intervals, and the thickness of the card going into the connector is 1.6 mm.[138][139]

Data transmission

[edit]

PCIe sends all control messages, including interrupts, over the same links used for data. The serial protocol can never be blocked, so latency is still comparable to conventional PCI, which has dedicated interrupt lines. When the problem of IRQ sharing of pin based interrupts is taken into account and the fact that message signaled interrupts (MSI) can bypass an I/O APIC and be delivered to the CPU directly, MSI performance ends up being substantially better.[140]

Data transmitted on multiple-lane links is interleaved, meaning that each successive byte is sent down successive lanes. The PCIe specification refers to this interleaving as data striping. While requiring significant hardware complexity to synchronize (or deskew) the incoming striped data, striping can significantly reduce the latency of the nth byte on a link. While the lanes are not tightly synchronized, there is a limit to the lane to lane skew of 20/8/6 ns for 2.5/5/8 GT/s so the hardware buffers can re-align the striped data.[141] Due to padding requirements, striping may not necessarily reduce the latency of small data packets on a link.

As with other high data rate serial transmission protocols, the clock is embedded in the signal. At the physical level, PCI Express 2.0 utilizes the 8b/10b encoding scheme[63] (line code) to ensure that strings of consecutive identical digits (zeros or ones) are limited in length. This coding was used to prevent the receiver from losing track of where the bit edges are. In this coding scheme every eight (uncoded) payload bits of data are replaced with 10 (encoded) bits of transmit data, causing a 20% overhead in the electrical bandwidth. To improve the available bandwidth, PCI Express version 3.0 instead uses 128b/130b encoding (1.54% overhead). Line encoding limits the run length of identical-digit strings in data streams and ensures the receiver stays synchronised to the transmitter via clock recovery.

A desirable balance (and therefore spectral density) of 0 and 1 bits in the data stream is achieved by XORing a known binary polynomial as a "scrambler" to the data stream in a feedback topology. Because the scrambling polynomial is known, the data can be recovered by applying the XOR a second time. Both the scrambling and descrambling steps are carried out in hardware.

Dual simplex in PCIe means there are two simplex channels on every PCIe lane. Simplex means communication is only possible in one direction. By having two simplex channels, two-way communication is made possible. One differential pair is used for each channel.[142][1][143]

[edit]

The data link layer performs three vital services for the PCIe link:

  1. sequence the transaction layer packets (TLPs) that are generated by the transaction layer,
  2. ensure reliable delivery of TLPs between two endpoints via an acknowledgement protocol (ACK and NAK signaling) that explicitly requires replay of unacknowledged/bad TLPs,
  3. initialize and manage flow control credits

On the transmit side, the data link layer generates an incrementing sequence number for each outgoing TLP. It serves as a unique identification tag for each transmitted TLP, and is inserted into the header of the outgoing TLP. A 32-bit cyclic redundancy check code (known in this context as Link CRC or LCRC) is also appended to the end of each outgoing TLP.

On the receive side, the received TLP's LCRC and sequence number are both validated in the link layer. If either the LCRC check fails (indicating a data error), or the sequence-number is out of range (non-consecutive from the last valid received TLP), then the bad TLP, as well as any TLPs received after the bad TLP, are considered invalid and discarded. The receiver sends a negative acknowledgement message (NAK) with the sequence-number of the invalid TLP, requesting re-transmission of all TLPs forward of that sequence-number. If the received TLP passes the LCRC check and has the correct sequence number, it is treated as valid. The link receiver increments the sequence-number (which tracks the last received good TLP), and forwards the valid TLP to the receiver's transaction layer. An ACK message is sent to remote transmitter, indicating the TLP was successfully received (and by extension, all TLPs with past sequence-numbers.)

If the transmitter receives a NAK message, or no acknowledgement (NAK or ACK) is received until a timeout period expires, the transmitter must retransmit all TLPs that lack a positive acknowledgement (ACK). Barring a persistent malfunction of the device or transmission medium, the link-layer presents a reliable connection to the transaction layer, since the transmission protocol ensures delivery of TLPs over an unreliable medium.

In addition to sending and receiving TLPs generated by the transaction layer, the data-link layer also generates and consumes data link layer packets (DLLPs). ACK and NAK signals are communicated via DLLPs, as are some power management messages and flow control credit information (on behalf of the transaction layer).

In practice, the number of in-flight, unacknowledged TLPs on the link is limited by two factors: the size of the transmitter's replay buffer (which must store a copy of all transmitted TLPs until the remote receiver ACKs them), and the flow control credits issued by the receiver to a transmitter. PCI Express requires all receivers to issue a minimum number of credits, to guarantee a link allows sending PCIConfig TLPs and message TLPs.

Transaction layer

[edit]

PCI Express implements split transactions (transactions with request and response separated by time), allowing the link to carry other traffic while the target device gathers data for the response.

PCI Express uses credit-based flow control. In this scheme, a device advertises an initial amount of credit for each received buffer in its transaction layer. The device at the opposite end of the link, when sending transactions to this device, counts the number of credits each TLP consumes from its account. The sending device may only transmit a TLP when doing so does not make its consumed credit count exceed its credit limit. When the receiving device finishes processing the TLP from its buffer, it signals a return of credits to the sending device, which increases the credit limit by the restored amount. The credit counters are modular counters, and the comparison of consumed credits to credit limit requires modular arithmetic. The advantage of this scheme (compared to other methods such as wait states or handshake-based transfer protocols) is that the latency of credit return does not affect performance, provided that the credit limit is not encountered. This assumption is generally met if each device is designed with adequate buffer sizes.

PCIe 1.x is often quoted to support a data rate of 250 MB/s in each direction, per lane. This figure is a calculation from the physical signaling rate (2.5 gigabaud) divided by the encoding overhead (10 bits per byte). This means a sixteen lane (×16) PCIe card would then be theoretically capable of 16×250 MB/s = 4 GB/s in each direction. While this is correct in terms of data bytes, more meaningful calculations are based on the usable data payload rate, which depends on the profile of the traffic, which is a function of the high-level (software) application and intermediate protocol levels.

Like other high data rate serial interconnect systems, PCIe has a protocol and processing overhead due to the additional transfer robustness (CRC and acknowledgements). Long continuous unidirectional transfers (such as those typical in high-performance storage controllers) can approach >95% of PCIe's raw (lane) data rate. These transfers also benefit the most from increased number of lanes (×2, ×4, etc.) But in more typical applications (such as a USB or Ethernet controller), the traffic profile is characterized as short data packets with frequent enforced acknowledgements.[144] This type of traffic reduces the efficiency of the link, due to overhead from packet parsing and forced interrupts (either in the device's host interface or the PC's CPU). Being a protocol for devices connected to the same printed circuit board, it does not require the same tolerance for transmission errors as a protocol for communication over longer distances, and thus, this loss of efficiency is not particular to PCIe.

[edit]

As for any network-like communication links, some of the raw bandwidth is consumed by protocol overhead:[145]

A PCIe 1.x lane for example offers a data rate on top of the physical layer of 250 MB/s (simplex). This is due to a 2.5 GT/s bit rate multiplied by the efficiency of the 8b/10b line code (see #Comparison table). This is not the payload bandwidth but the physical layer bandwidth – a PCIe lane has to carry additional information for full functionality.[145]

Gen 2 Transaction Layer Packet[145]: 3 
Layer PHY Data Link Layer Transaction Data Link Layer PHY
Data Start Sequence Header Payload ECRC LCRC End
Size (Bytes) 1 2 12 or 16 0 to 4096 4 (optional) 4 1

The Gen2 overhead is then 20, 24, or 28 bytes per transaction.

Gen 3 Transaction Layer Packet[145]: 3 
Layer PHY Data Link Layer Transaction Layer Data Link Layer
Data Start Sequence Header Payload ECRC LCRC
Size (Bytes) 4 2 12 or 16 0 to 4096 4 (optional) 4

The Gen3 overhead is then 22, 26 or 30 bytes per transaction.

The for a 128 byte payload is 86%, and 98% for a 1024 byte payload. For small accesses like register settings (4 bytes), the efficiency drops as low as 16%. That said, most PCIe config registers reside in a DMA region mapped to the CPU's control registers and require no bus access.[citation needed]

The maximum payload size (MPS) is set on all devices based on smallest maximum on any device in the chain. If one device has an MPS of 128 bytes, all devices of the tree must set their MPS to 128 bytes. In this case the bus will have a maximum efficiency of 86% for writes.[145]: 3 

Applications

[edit]
Asus Nvidia GeForce GTX 650 Ti, a PCI Express 3.0 ×16 graphics card
The Nvidia GeForce GTX 1070, a PCI Express 3.0 ×16 Graphics card
Intel 82574L Gigabit Ethernet NIC, a PCI Express ×1 card
A Marvell-based SATA 3.0 controller, as a PCI Express ×1 card

PCI Express operates in consumer, server, and industrial applications, as a motherboard-level interconnect (to link motherboard-mounted peripherals), a passive backplane interconnect and as an expansion card interface for add-in boards.

In virtually all modern (as of 2012) PCs, from consumer laptops and desktops to enterprise servers, the PCIe bus serves as the primary motherboard-level interconnect, connecting the host system-processor with both integrated peripherals (surface-mounted ICs) and add-on peripherals (expansion cards). In some of these systems, the PCIe bus co-exists with one or more legacy PCI buses, for backward compatibility with the large body of legacy PCI peripherals.

As of 2013, PCI Express has replaced AGP as the default interface for graphics cards on new systems. Almost all models of graphics cards released since 2010 by AMD (ATI) and Nvidia use PCI Express. AMD, Nvidia, and Intel have released motherboard chipsets that support as many as four PCIe ×16 slots, allowing tri-GPU and quad-GPU card configurations.

External GPUs

[edit]

Theoretically, external PCIe could give a notebook the graphics power of a desktop, by connecting a notebook with any PCIe desktop video card (enclosed in its own external housing, with a power supply and cooling); this is possible with an ExpressCard or Thunderbolt interface. An ExpressCard interface provides bit rates of 5 Gbit/s (0.5 GB/s throughput), whereas a Thunderbolt interface provides bit rates of up to 40 Gbit/s (5 GB/s throughput).

In 2006, Nvidia developed the Quadro Plex external PCIe family of GPUs that can be used for advanced graphic applications for the professional market.[146] These video cards require a PCI Express ×8 or ×16 slot for the host-side card, which connects to the Plex via a VHDCI carrying eight PCIe lanes.[147]

In 2008, AMD announced the ATI XGP technology, based on a proprietary cabling system that is compatible with PCIe ×8 signal transmissions.[148] This connector is available on the Fujitsu Amilo and the Acer Ferrari One notebooks. Fujitsu launched their AMILO GraphicBooster enclosure for XGP soon thereafter.[149] Around 2010 Acer launched the Dynavivid graphics dock for XGP.[150]

In 2010, external card hubs were introduced that can connect to a laptop or desktop through a PCI ExpressCard slot. These hubs can accept full-sized graphics cards. Examples include MSI GUS,[151] Village Instrument's ViDock,[152] the Asus XG Station, Bplus PE4H V3.2 adapter,[153] as well as more improvised DIY devices.[154] However such solutions are limited by the size (often only ×1) and version of the available PCIe slot on a laptop.

The Intel Thunderbolt interface has provided a new option to connect with a PCIe card externally. Magma has released the ExpressBox 3T, which can hold up to three PCIe cards (two at ×8 and one at ×4).[155] MSI also released the Thunderbolt GUS II, a PCIe chassis dedicated for video cards.[156] Other products such as the Sonnet's Echo Express[157] and mLogic's mLink are Thunderbolt PCIe chassis in a smaller form factor.[158]

In 2017, more fully featured external card hubs were introduced, such as the Razer Core, which has a full-length PCIe ×16 interface.[159]

Storage devices

[edit]
An OCZ RevoDrive SSD, a full-height ×4 PCI Express card

The PCI Express protocol can be used as data interface to flash memory devices, such as memory cards and solid-state drives (SSDs).

The XQD card is a memory card format utilizing PCI Express, developed by the CompactFlash Association, with transfer rates of up to 1 GB/s.[160]

Many high-performance, enterprise-class SSDs are designed as PCI Express RAID controller cards.[citation needed] Before NVMe was standardized, many of these cards utilized proprietary interfaces and custom drivers to communicate with the operating system; they had much higher transfer rates (over 1 GB/s) and IOPS (over one million I/O operations per second) when compared to Serial ATA or SAS drives.[quantify][161][162] For example, in 2011 OCZ and Marvell co-developed a native PCI Express solid-state drive controller for a PCI Express 3.0 ×16 slot with maximum capacity of 12 TB and a performance of to 7.2 GB/s sequential transfers and up to 2.52 million IOPS in random transfers.[163][relevant?]

SATA Express was an interface for connecting SSDs through SATA-compatible ports, optionally providing multiple PCI Express lanes as a pure PCI Express connection to the attached storage device.[164] M.2 is a specification for internally mounted computer expansion cards and associated connectors, which also uses multiple PCI Express lanes.[165]

PCI Express storage devices can implement both AHCI logical interface for backward compatibility, and NVM Express logical interface for much faster I/O operations provided by utilizing internal parallelism offered by such devices. Enterprise-class SSDs can also implement SCSI over PCI Express.[166]

Cluster interconnect

[edit]

Certain data-center applications (such as large computer clusters) require the use of fiber-optic interconnects due to the distance limitations inherent in copper cabling. Typically, a network-oriented standard such as Ethernet or Fibre Channel suffices for these applications, but in some cases the overhead introduced by routable protocols is undesirable and a lower-level interconnect, such as InfiniBand, RapidIO, or NUMAlink is needed. Local-bus standards such as PCIe and HyperTransport can in principle be used for this purpose,[167] but as of 2015, solutions are only available from niche vendors such as Dolphin ICS, and TTTech Auto.

Competing protocols

[edit]

PCI-E 1.0 initially competed with PCI-X 2.0, with both specifications being ratified in 2003 and offering roughly the same maximum bandwidth (~4 GB/s). By 2005, however, PCI-E emerged as the dominant technology.

Other communications standards based on high bandwidth serial architectures include InfiniBand, RapidIO, HyperTransport, Intel QuickPath Interconnect, the Mobile Industry Processor Interface (MIPI), and NVLink. Differences are based on the trade-offs between flexibility and extensibility vs latency and overhead. For example, making the system hot-pluggable, as with Infiniband but not PCI Express, requires that software track network topology changes.[citation needed]

Another example is making the packets shorter to decrease latency (as is required if a bus must operate as a memory interface). Smaller packets mean packet headers consume a higher percentage of the packet, thus decreasing the effective bandwidth. Examples of bus protocols designed for this purpose are RapidIO and HyperTransport.[citation needed]

PCI Express falls somewhere in the middle,[clarification needed] targeted by design as a system interconnect (local bus) rather than a device interconnect or routed network protocol. Additionally, its design goal of software transparency constrains the protocol and raises its latency somewhat.[citation needed]

Delays in PCIe 4.0 implementations led to the Gen-Z consortium, the CCIX effort and an open Coherent Accelerator Processor Interface (CAPI) all being announced by the end of 2016.[168]

On 11 March 2019, Intel presented Compute Express Link (CXL), a new interconnect bus, based on the PCI Express 5.0 physical layer infrastructure. The initial promoters of the CXL specification included: Alibaba, Cisco, Dell EMC, Facebook, Google, HPE, Huawei, Intel and Microsoft.[169]

Integrators list

[edit]

The PCI-SIG Integrators List lists products made by PCI-SIG member companies that have passed compliance testing. The list include switches, bridges, NICs, SSDs, etc.[170]

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
PCI Express (PCIe), officially abbreviated as PCIe, is a high-speed serial computer expansion bus standard for connecting hardware devices such as graphics cards, storage drives, and network adapters to a motherboard or other host systems. Developed and maintained by the PCI Special Interest Group (PCI-SIG), it defines the electrical, protocol, platform architecture, and programming interfaces necessary for interoperable devices across client, server, embedded, and communication markets. As a successor to the parallel PCI Local Bus, PCIe employs a point-to-point topology with scalable lane configurations (e.g., x1, x4, x8, x16) to deliver low-latency, high-bandwidth data transfers while supporting backward compatibility across generations. The PCI Express Base Specification Revision 1.0 was initially released on April 29, 2002, following an announcement by renaming the technology from 3GIO to . Subsequent revisions have progressively doubled bandwidth roughly every three years, starting with 2.5 GT/s (gigatransfers per second) in version 1.0 and advancing to 5 GT/s in 2.0 (2007), 8 GT/s in 3.0 (2010), 16 GT/s in 4.0 (2017), 32 GT/s in 5.0 (2019), 64 GT/s in 6.0 (2021), and 128 GT/s in 7.0 (June 2025). A draft of version 8.0, targeting 256 GT/s, was made available to members in 2025, with full release planned for 2028 to support emerging demands in , , and high-speed networking. Key features of PCIe include its use of packet-based communication over differential signaling lanes, advanced error correction like CRC and in later generations, and power management states for energy efficiency. The architecture ensures vendor interoperability through rigorous compliance testing and supports diverse form factors, such as for solid-state drives and CEM (Card Electromechanical) for add-in cards. By 2025, PCIe has become the interconnect for data-intensive applications, enabling terabit-per-second aggregate bandwidth in configurations like x16 at 7.0 speeds.

Architecture

Physical Interconnect

PCI Express (PCIe) is a high-speed serial interconnect standard that implements a layered over a point-to-point topology, utilizing using (CML) for electrical communication between devices. The consists of the transaction layer for handling data packets, the for ensuring integrity through cyclic redundancy checks and acknowledgments, and the for managing serialization, encoding, and signaling. This design enables reliable, high-bandwidth transfers in a dual-simplex manner, where each direction operates independently. The interconnect employs a switch-based fabric to support connectivity among multiple components. At the core is the , which interfaces the CPU and subsystem with the PCIe domain, initiating transactions and managing configuration. Endpoints represent terminal devices, such as network adapters or storage controllers, that consume or produce data. Switches act as intermediaries, routing packets between the root complex and endpoints or among endpoints, effectively creating a scalable tree-like structure that mimics traditional PCI bus hierarchies while avoiding shared medium contention. Packet-based communication forms the basis of data exchange, with transactions encapsulated in transaction layer packets (TLPs) that include headers, , and error-checking fields. These packets traverse dedicated transmit and receive , each comprising a pair of differential wires for using CML, allowing full-duplex operation without the need for a separate clock line due to embedded . serve as the basic building blocks, enabling aggregation for increased throughput. This serial architecture evolved from the parallel PCI bus to overcome inherent limitations in speed and scalability. The parallel PCI, operating at up to 133 MB/s with a shared bus and susceptible to signal skew, constrained system performance in expanding I/O environments. PCIe, developed by the PCI-SIG and first specified in 2002, serialized the interface into point-to-point links with low-voltage differential signaling using CML, delivering superior bandwidth density, reduced pin count, and hot-plug capabilities while preserving PCI software compatibility.

Lanes and Bandwidth

A PCI Express is defined as a full-duplex link composed of one differential transmit pair and one differential receive pair, enabling simultaneous bidirectional data transfer between devices. PCIe supports scalable configurations ranging from x1 (a single ) to x16 (16 lanes), with the aggregate bandwidth increasing linearly based on the number of lanes utilized, allowing devices to match their throughput requirements to available interconnect capacity. The effective rate for a PCIe link is calculated using the : effective rate = (signaling rate × encoding × number of ) / 8 bytes per second, where the signaling rate is expressed in gigatransfers per second (GT/s), and encoding accounts for overhead from schemes like 8b/10b (80% ) in earlier generations or 128b/130b (approximately 98.5% ) in later ones. For example, high-performance graphics processing units (GPUs) typically use x16 configurations in desktop systems to maximize bandwidth for rendering and compute tasks, while discrete GPUs in laptops usually use fewer lanes, commonly PCIe 4.0 x8 (or x4 in some cases), due to constraints on power, space, and thermals, resulting in roughly half the bandwidth (approximately 15.8 GB/s effective throughput per direction for x8 versus 31.5 GB/s for x16). In practice, this reduction rarely limits performance significantly for most applications, as other factors like GPU memory bandwidth dominate; solid-state drives (SSDs) typically employ x4 configurations for efficient storage access; in a PCIe 4.0 setup at 16 GT/s with 128b/130b encoding, an x16 link achieves approximately 31.5 GB/s effective throughput per direction (raw of 256 GT/s across lanes, adjusted for ~1.5% overhead), compared to ~7.9 GB/s for an x4 link.

Serial Bus Operation

PCI Express functions as a serial bus by transmitting data over differential pairs known as , where the is embedded within the serial data stream rather than using separate shared clock lines for each lane. Receivers employ Clock Data Recovery (CDR) circuits to extract the timing information directly from the incoming data transitions, enabling precise synchronization without additional clock distribution overhead. This approach supports high-speed operation by minimizing skew between clock and data, while a common reference clock (REFCLK) may be shared across devices in standard configurations to align overall system timing. Newer generations like PCIe 6.0 and beyond employ PAM4 modulation for increased data rates per symbol. The initialization of a PCI Express link occurs through the Link Training and Status State Machine (LTSSM), a state machine in the that coordinates the establishment of a reliable connection between devices. Upon reset or hot-plug event, the LTSSM progresses through states such as Detect, Polling, Configuration, and Recovery to negotiate link width (number of active ), speed (e.g., 2.5 GT/s to 128 GT/s depending on generation, with drafts targeting 256 GT/s in PCIe 8.0), and perform equalization. During the Polling and Configuration states, devices exchange Training Sequence ordered sets (TS1 and TS2) containing link and numbers, enabling polarity inversion detection and lane alignment. Link equalization, a critical phase within the Recovery state, adjusts transmitter pre-emphasis and receiver de-emphasis settings to mitigate inter-symbol interference and signal attenuation over the channel. Devices propose and select from preset coefficients via TS1/TS2 ordered sets, iterating through phases until optimal is achieved, ensuring reliable operation at the negotiated speed. Speed similarly occurs during , where devices advertise supported rates and fallback to lower speeds if higher ones fail, prioritizing . Hot-plug capabilities allow dynamic addition or removal of devices without system interruption, initiated by presence detect signals that trigger LTSSM re-training for the affected link. This feature relies on slot power controllers to sequence power delivery and handling, maintaining stability during insertion. For power efficiency in serial operation, PCI Express implements (ASPM) with defined link states: L0 for full-speed active transmission; L0s for low-power standby in the downstream direction, where the receiver enters electrical idle after idle timeouts; L1 for bidirectional low power, disabling main link power with auxiliary power for wake events. Transitions between states, such as entering L0s or L1, are negotiated via DLLPs and managed to balance latency with savings, typically reducing power by up to 90% in L1. At the physical layer, the basic frame structure in the serial stream consists of delimited packets encoded with schemes like 8b/10b (PCIe 1.0–2.0), 128b/130b (PCIe 3.0–5.0), or FLIT-based encoding with (PCIe 6.0 and later), ensuring DC balance and . Each frame begins with a start-of-frame (COM symbol, a K-code), followed by the header and data payload scrambled for , and concludes with an end-of-frame (END symbol), sequence number, and link CRC for error detection. Control information, such as SKP ordered sets for clock compensation, is periodically inserted to maintain lane deskew without interrupting the payload flow.

Physical Form Factors

Standard Slots and Cards

Standard PCI Express (PCIe) slots are designed in various physical lengths to accommodate different numbers of lanes, providing flexibility for add-in cards in desktop and server systems. The common configurations include x1, x4, x8, and x16 slots, where the numeral denotes the maximum number of lanes supported electrically and physically. An x1 slot supports a single with 36 pins (18 on each side of the connector), while an x4 slot extends to 64 pins (32 on each side), an x8 to 98 pins (49 on each side), and an x16 to 164 pins (82 on each side), with keying notches for proper insertion. These slots ensure , allowing a physically shorter card—such as an x1 or x4—to insert into a longer slot like x16, with the system negotiating the available lanes during initialization. Conversely, a longer card cannot fit into a shorter slot due to the mechanical keying and pin differences, preventing mismatches that could components. This design maintains across PCIe generations, as newer cards operate at the speed of the hosting slot if lower. Power delivery in standard PCIe slots is provided through dedicated rails on the edge connector, primarily +3.3 V and +12 V, enabling up to 75 W total without auxiliary connectors. The +12 V rail supplies the majority of power at a maximum of 5.5 A (66 W), while the +3.3 V rail is limited to 3 A (9.9 W), with tolerances of ±9% for voltage stability. For x16 slots, this allocation supports most low-to-mid-power add-in cards, but high-performance devices often require supplemental power via 6-pin or 8-pin connectors from the power supply unit to exceed the slot's limit. The pinout of an x16 slot follows a standardized layout defined in the PCI Express Card Electromechanical Specification, with Side A (longer edge) and Side B pins arranged in a dual-row configuration for signal integrity. Key elements include multiple ground pins (GND) distributed throughout for shielding and return paths, power pins clustered near the center—such as +12 V at A2/A3/B2/B3 and +3.3 V at A10/B10—and differential pairs for transmit (PETp/PETn) and receive (PERp/PERn) signals across 16 lanes, where n ranges from 0 to 15. Presence detect pins (PRSNT1# and PRSNT2#) on Side B indicate card length to the host, while reference clock pairs (REFCLK+ and REFCLK-) and SMBus lines support clocking and management functions. This arrangement ensures low crosstalk and supports high-speed serial transmission up to 64 GT/s in recent revisions. Non-standard video card form factors, such as dual-slot coolers, extend beyond the single-slot width (typically 20 mm) to approximately 40 mm, allowing larger heatsinks and fans for improved management on high-power graphics units (GPUs). Electrically, these designs do not alter the core PCIe interface but often necessitate connectors—up to three 8-pin for 300 or more—to supplement the 75 slot limit, as the increased demands correlate with higher power consumption exceeding slot capabilities. This can block adjacent expansion slots mechanically, requiring careful planning, though the electrical interface remains compliant with standard pinouts.

Compact and Embedded Variants

Compact and embedded variants of PCI Express address the need for high-speed connectivity in space-constrained environments such as laptops, tablets, and embedded systems, where full-sized slots are impractical. These form factors prioritize while maintaining compatibility with the core PCI Express protocol, enabling applications like wireless networking and . The PCI Express Mini Card, introduced as an early compact solution, measures approximately 30 mm by 51 mm for the full-size version, with a 52-pin that supports a single PCI Express lane alongside USB 2.0 and SMBus interfaces. This pinout allows multiplexing of signals for diverse uses, including modules compliant with standards and early , making it suitable for notebook expansions without occupying much internal space. Power delivery is limited to 3.3 V at up to 2.75 A peak via the auxiliary rail, ensuring compatibility with battery-powered devices. Succeeding the Mini Card, the form factor—formerly known as Next Generation Form Factor (NGFF)—offers even greater flexibility with a smaller footprint, featuring a 75-pin and various keying notches to prevent mismatches. Key B supports up to two PCI Express lanes or a single interface, ideal for storage and legacy compatibility, while Key M accommodates up to four PCI Express lanes for higher bandwidth needs, also sharing pins with for hybrid operation. Available in lengths from 2230 (22 mm × 30 mm) to 2280 (22 mm × 80 mm), modules integrate seamlessly with mSATA derivatives, allowing systems to route either PCI Express or SATA traffic over the same lanes based on detection signals. Electrically, it operates at 3.3 V with a power limit of up to 3 A, distributed across multiple pins to handle demands in dense layouts. As of 2025, supports PCIe 6.0 for enhanced performance in NVMe SSDs. In ultrabooks and (IoT) devices, these variants enable efficient storage and connectivity, such as NVMe SSDs for rapid data access in thin laptops or /Bluetooth combos in smart sensors, often fitting directly onto motherboards to save volume. Thermal management is critical due to the confined spaces, where high-performance components like Gen4 PCIe SSDs can reach 70–80°C under load, prompting designs with integrated heatsinks, thermal throttling algorithms, or low-power modes to maintain reliability and prevent performance degradation. For instance, embedded controllers monitor junction temperatures and reduce clock speeds if thresholds exceed 85°C, ensuring longevity in fanless IoT applications.

External Cabling and Derivatives

PCI Express external cabling enables connectivity between systems and peripherals outside the chassis, supporting standards defined by the for reliable high-speed data transfer. The specification covers both passive and active cable assemblies, with passive cables relying on standard conductors without amplification, limited to a maximum length of 1 meter for configurations up to x8 lanes to maintain at speeds up to 64 GT/s in PCIe 6.0. Active cables incorporate retimers or equalizers to extend reach up to 3 meters while supporting the same lane widths (x1, x4, x8, and x16), accommodating PCIe generations from 1.0 (2.5 GT/s) through 6.0 (64 GT/s). These cables use SFF-8614 connectors and adhere to electrical requirements such as under 7.5 dB at relevant frequencies and budgets below 0.145 UI, ensuring compatibility with storage enclosures and docking stations. OCuLink (Optical-Copper Link) provides a compact external interface for PCIe and SAS protocols, optimized for enterprise storage and server applications. Defined under SFF-8611 by the SFF Technology Affiliate (SNIA), it supports up to four PCIe lanes in a single connector, delivering aggregate bandwidths of 32 Gbps at 8 GT/s (PCIe 3.0), 64 Gbps at 16 GT/s (PCIe 4.0), or 128 Gbps at 32 GT/s (PCIe 5.0), with SAS 4.0 extending to 24 Gb/s per lane. The pinout aligns with PCIe standards, featuring 36 pins including differential pairs for Tx/Rx signals, ground, and signaling, enabling reversible cabling up to 2 meters without active components. This configuration facilitates hot-pluggable connections in data centers, bridging internal PCIe slots to external enclosures while maintaining low latency and power efficiency. Thunderbolt serves as a prominent of PCIe, encapsulating its protocol over for versatile external expansion. 3, for instance, tunnels up to four lanes of PCIe 3.0 (32 Gbps total) alongside and USB 3.1 within a 40 Gbps bidirectional link, dynamically allocating bandwidth where display traffic (up to two 4K@60Hz streams via 1.2) takes priority and PCIe utilizes the remainder. This sharing mechanism supports daisy-chaining of devices like external GPUs and storage arrays, with the connector providing a unified for power delivery up to 100W. Subsequent versions, including 4, 5, and integration with , maintain PCIe tunneling—up to PCIe 4.0 x4 (64 Gbps) in 5—while enhancing compatibility and security features as of 2025. ExpressCard represents a legacy derivative of PCIe, introduced as a modular expansion standard combining PCIe and USB 2.0 over a single-edge connector for laptops and compact systems. Supporting up to PCIe x1 (2.5 GT/s) or USB 2.0, it enabled add-in cards for networking and storage but has been phased out in favor of higher-bandwidth alternatives like and , which offer scalable PCIe lanes over without proprietary slot requirements. The standard's simplification of the earlier CardBus interface facilitated easier integration, though its limited speeds and form factor obsolescence led to discontinuation around 2010.

History and Revisions

Early Development and Versions 1.x–2.x

The PCI Special Interest Group (PCI-SIG) was established in June 1992 as an open industry consortium to develop, maintain, and promote the Peripheral Component Interconnect (PCI) family of specifications, initially focused on the parallel PCI bus standard as a successor to earlier architectures like ISA and EISA. By the late 1990s, limitations in PCI's shared parallel bus design—such as signal skew, crosstalk, and scalability constraints at higher speeds—prompted efforts to evolve the technology toward a serial interconnect. This led to the development of PCI Express (PCIe), intended to replace both PCI and the Accelerated Graphics Port (AGP) with a point-to-point serial architecture that addressed these issues through differential signaling and embedded clocking, enabling higher bandwidth and better signal integrity. The PCI Express Base Specification Revision 1.0 was initially released on April 29, 2002, with the 1.0a update ratified in July 2002, establishing a per-lane data rate of 2.5 gigatransfers per second (GT/s) using 8b/10b encoding for DC balance and . This encoding scheme, which adds overhead but ensures reliable transmission over serial links, supported aggregate bandwidths up to 4 GB/s for an x16 configuration after accounting for encoding inefficiency. The transition from PCI's parallel bus to PCIe required overcoming significant challenges, including managing high-speed serial signal , where issues like and eye diagram closure demanded precise equalization and transmitter/receiver compliance testing. PCI Express 1.1, released in late 2003, introduced refinements to the electrical specifications, including tighter budgets and (PLL) bandwidth requirements to improve link reliability without altering the core 2.5 GT/s rate. These updates addressed early implementation feedback on signal margins, facilitating broader . In January 2007, released the PCI Express specification, doubling the per-lane speed to 5 GT/s while retaining 8b/10b encoding and full with 1.x devices through automatic link negotiation to the lower speed. Key enhancements in included improved (ASPM) mechanisms, such as refined L0s and L1 low-power link states, to reduce idle power consumption in mobile and desktop systems without compromising performance. Early adoption of PCI Express began with 's implementation in its 9xx series chipsets, such as the 925X (Alderwood) and 915P (Grantsdale), which debuted in mid-2004 and integrated PCIe lanes for graphics and general I/O, marking the shift away from AGP in mainstream platforms. These chipsets supported up to 16 PCIe lanes for graphics at 1.x speeds, enabling initial deployments in consumer desktops and servers. The parallel-to-serial paradigm shift presented deployment hurdles, including the need for new PCB layout techniques to minimize and reflections in serial traces, as well as retraining engineers on serial protocol debugging over legacy parallel tools. Despite these, PCIe quickly gained traction, with shipping millions of units by 2005, paving the way for widespread replacement of PCI slots.

Versions 3.x–5.x and Specification Comparison

PCI Express 3.0, released in November 2010 by the , marked a significant advancement over version 2.0 by doubling the signaling rate to 8 GT/s while introducing 128b/130b encoding for improved efficiency over the previous 8b/10b scheme. This encoding reduced overhead, enabling approximately 985 MB/s of effective bandwidth per lane after accounting for encoding efficiency. The specification maintained with prior generations, facilitating widespread adoption in consumer and enterprise systems seeking higher throughput without major hardware overhauls. PCI Express 3.1, finalized in October 2013, served as a minor revision to 3.0, retaining the 8 GT/s rate and 128b/130b encoding while introducing enhancements such as improved multi-root support for SR-IOV and refined for better integration in virtualized environments. These updates focused on protocol refinements rather than raw performance gains, ensuring seamless evolution for existing ecosystems. By this point, PCIe 3.x had become the for high-speed peripherals, particularly in storage applications. PCI Express 4.0, announced in June 2017, doubled the data rate to 16 GT/s using the same 128b/130b encoding, yielding roughly 1.97 GB/s per and supporting up to 31.5 GB/s for an x16 configuration. Key improvements included relaxed transmitter de-emphasis requirements to enhance over longer channels, enabling reliable operation at higher speeds without excessive power increases. This version prioritized scalability for emerging demands in and data centers, with features like extended tags for larger payloads. PCI Express 5.0, released in May 2019, further doubled the rate to 32 GT/s, maintaining 128b/130b encoding for about 3.94 GB/s per lane and up to 63 GB/s in an x16 link. It introduced Integrity and Data Encryption (IDE) for enhanced and supported adaptable lane configurations to optimize power and in diverse systems, including early integration with protocols like (cXL) via its . These advancements addressed bandwidth bottlenecks in AI and , with a focus on maintaining low latency. The evolution from versions 3.x to 5.x emphasized incremental doubling of bandwidth every few years, driven by encoding efficiencies established in 3.0 and refined signaling in later revisions to support denser integrations without proportional power scaling. Each generation preserved full backward and forward compatibility, allowing gradual upgrades in ecosystems like servers and workstations.
VersionRelease YearData Rate (GT/s)EncodingMax Bandwidth (x16, GB/s, approx. unidirectional)Key Features
3.020108128b/130b16Efficient encoding for doubled bandwidth over 2.0; backward compatibility focus
3.120138128b/130b16SR-IOV multi-root enhancements; power management refinements
4.0201716128b/130b32Relaxed de-emphasis for signal integrity; extended tags for scalability
5.0201932128b/130b64IDE security; adaptable lanes for cXL compatibility; low-latency optimizations
Adoption of these versions accelerated with application-specific needs: PCIe 3.0 gained traction in SSDs starting around 2012, enabling multi-gigabyte-per-second storage speeds in consumer PCs and enterprise arrays. PCIe 4.0 saw widespread use in GPUs from 2019 onward, powering high-end cards like AMD's Radeon RX 5000 series and NVIDIA's RTX 30 series for improved rendering and AI workloads. By 2021, PCIe 5.0 had begun deployment in servers, supporting next-generation processors and accelerators in data centers for enhanced disaggregated computing.

Versions 6.x–8.x and Future Directions

PCI Express 6.0, finalized by the in January 2022, doubles the data rate of its predecessor to 64 GT/s per using with 4 levels (PAM4) signaling, which encodes two bits per symbol to achieve higher throughput while maintaining compatible channel reach. (FEC) is mandatory in this version to mitigate the higher bit error rates introduced by PAM4, ensuring reliable data transmission in high-speed environments. The specification also supports the (CXL) 3.0 protocol, enabling cache-coherent memory expansion and pooling for AI and applications over the same . Commercial adoption of PCIe 6.0 hardware, including controllers and retimers, began appearing in and in 2025. Building on this foundation, PCI Express 7.0 was officially released by the in June 2025, achieving 128 GT/s per lane through further refinements in PAM4 signaling and enhanced FEC mechanisms that improve error correction efficiency for sustained performance. The specification's development included version 0.9 draft approval in March 2025, focusing on for hyperscale data centers where massive parallel processing demands ultra-high bandwidth. Targeted primarily at AI training clusters and systems, PCIe 7.0 supports up to 512 GB/s bidirectional throughput in an x16 configuration, addressing the escalating movement needs in these domains. In August 2025, the PCI-SIG announced the initiation of PCI Express 8.0 development, aiming for 256 GT/s per lane to deliver up to 1 TB/s bidirectional bandwidth in x16 links, representing another doubling of raw data rates. The version 0.3 draft was made available to members in September 2025, with a full specification release planned for 2028 to allow time for ecosystem maturation including silicon validation and optical interconnect integration. Looking ahead, the PCI-SIG's draft processes emphasize iterative workgroup approvals to incorporate advancements in signaling integrity and power efficiency, driven by the bandwidth requirements of AI, , and workloads. These efforts prioritize and support for emerging interconnect technologies to sustain PCIe as the foundational I/O standard for next-generation computing infrastructures.

Protocol Layers

Physical Layer

The Physical Layer (PHY) of PCI Express serves as the lowest protocol layer, responsible for bit-level transmission over serial links using differential signaling to ensure reliable data transfer across traces or cables. It encompasses the electrical and logical specifications for transmitting and receiving data symbols, including , deserialization, and to mitigate losses in high-speed environments. The PHY operates on a per- basis, where each lane consists of a transmit (TX) and receive (RX) differential pair, enabling full-duplex communication without a shared clock, relying instead on embedded mechanisms. Transceiver design in the Physical Layer employs differential pairs to transmit signals as voltage differences between two wires, which inherently rejects common-mode noise and , crucial for maintaining over distances up to several inches on printed circuit boards or longer in cabled variants. To counteract and inter-symbol interference (ISI) caused by the low-pass filtering effect of transmission media, transceivers incorporate pre-emphasis at the transmitter, which boosts high-frequency components during transitions by temporarily increasing the signal for those bits, and de-emphasis, which reduces the main cursor post-transition to prevent overdriving the receiver. These techniques are calibrated during link initialization to optimize eye opening at the receiver, with typical pre-emphasis levels ranging from 0 to 9.5 dB depending on channel characteristics. Clock data recovery (CDR) circuits at the receiver extract the embedded clock from the incoming stream using phase-locked loops or delay-locked loops, ensuring synchronization without a separate clock line and supporting data rates that scale with protocol revisions. Encoding schemes in the Physical Layer map data bits to symbols that ensure DC balance, sufficient transitions for clock recovery, and error detection, evolving from 8b/10b in early implementations to 128b/130b in later ones for improved efficiency. The 8b/10b scheme encodes 8-bit (plus control) into 10-bit symbols, achieving a 20% overhead while maintaining running disparity to control DC levels and providing comma characters for alignment, which helps in symbol boundary detection. In contrast, 128b/130b reduces overhead to about 1.5% by encoding 128-bit blocks into 130 bits with two sync header bits, incorporating (FEC) in advanced variants and relying on scrambling for balance rather than strict disparity. For even higher speeds using PAM4 modulation, PCIe 6.0+ introduces FLIT (Flow Control Unit) structures, which aggregate 256 bytes of into fixed-length frames with headers for enhanced error handling and efficiency over multi-bit symbols. transmission begins with scrambling using a linear feedback shift register (LFSR) polynomial of x16+x5+x4+x1+1x^{16} + x^{5} + x^{4} + x^{1} + 1 to randomize bit patterns, preventing long runs of identical bits that could degrade CDR performance or cause baseline wander; this is self-synchronizing, allowing the receiver to descramble without additional state information. Disparity control, primarily in 8b/10b, ensures the cumulative number of 1s and 0s remains balanced by selecting alternate symbol mappings when needed. Link training and synchronization are managed by the Link Training and Status State Machine (LTSSM), a finite state machine that progresses through defined states to establish and maintain the link. Starting from the Detect state, where devices sense receiver termination to confirm connectivity, the process advances to Polling, where training sequences (TS1 and TS2 ordered sets) are exchanged to align symbols and recover the clock. In the Configuration state, the link negotiates width, equalization presets, and other parameters using these sequences, applying up to 11 presets for transmitter equalization optimization via phase-based adaptation. Upon successful equalization, the LTSSM enters the L0 state, the normal operational mode for data transfer, with provisions for recovery states if signal quality degrades. This sequence ensures robust initialization, with the entire process typically completing in microseconds. The Data Link Layer (DLL) in PCI Express serves as the intermediary protocol layer between the Transaction Layer and the , ensuring reliable, ordered delivery of Transaction Layer Packets (TLPs) across the point-to-point link. It implements link-level error detection, correction through retransmission, flow control to prevent buffer overflows, and coordination with states, all while maintaining low latency for high-speed serial interconnects. Unlike end-to-end reliability handled higher in the stack, the DLL focuses on local link integrity, using dedicated control packets to manage these functions without interfering with data payloads. Central to DLL operations are Data Link Layer Packets (DLLPs), which carry control information such as acknowledgments, flow control updates, and power state transitions; these are transmitted opportunistically between TLPs and include a fixed format with a 16-bit CRC for error detection. The ACK/NAK mechanism provides confirmation of TLP receipt: upon verifying a TLP's sequence number and integrity, the receiver issues an ACK DLLP specifying the highest successfully received sequence number, enabling the transmitter to purge acknowledged packets from its storage. Conversely, if a TLP fails validation—due to CRC mismatch, sequence error, or reception issues—a NAK DLLP is sent, signaling the need for retransmission of all unacknowledged packets up to that point. This protocol uses 12-bit sequence numbers assigned to TLPs to enforce ordering, detect losses, and prevent replay attacks by discarding out-of-sequence or duplicate packets. Flow control complements this reliability by employing credit-based advertising: receivers periodically send INITFC and UPDATEFC DLLPs to inform transmitters of available buffer space per , quantified in units of 4 doublewords (DW), ensuring transmitters halt TLP issuance only when credits deplete to avoid overflows. Error detection in the DLL relies primarily on the CRC-16 appended to each DLLP for validating control packet integrity, with corrupted DLLPs discarded and logged as link errors; for TLPs, a complementary 32-bit Link CRC (LCRC) provides frame-level checking, while sequence numbers enable detection of missing or reordered packets without relying on higher-layer semantics. The retransmission protocol centers on replay buffers maintained by the transmitter, which store copies of recently sent TLPs (typically up to 32 or more, depending on implementation) for potential resending. Upon receiving a NAK DLLP or expiration of the Replay Timer (a configurable timeout, e.g., 100 µs at 5 GT/s, adjusted for link speed and latency), the transmitter replays all unacknowledged TLPs in original sequence order; to handle idle links efficiently, the protocol includes idle time flushing, where outstanding packets in the buffer are retransmitted during periods of inactivity (DL_Inactive state) to clear the buffer and resume normal operation, with the timer resetting after the final replay attempt. This ensures near-zero uncorrectable errors at the link level, with retransmissions typically incurring minimal overhead due to the high reliability of the underlying physical encoding. Power management integration in the DLL coordinates with the Physical Layer to support low-power states like L0s, where the link enters a partial shutdown after detecting idle time (e.g., no TLPs or DLLPs for ~4-8 µs, configurable via registers). Before L0s entry, the DLL accumulates sufficient flow control credits to cover potential retransmissions upon exit, preventing stalls; exit from L0s is triggered by pending TLPs or DLLPs, with the Physical Layer signaling readiness via Electrical Idle Ordered Sets (EIOS), followed by Flow Time Synchronization (FTS) symbols to realign clocks and symbols (up to 255 symbols at higher speeds). DLLPs such as PM DLLPs (e.g., PM_Enter_L0s_Nak if unprepared) facilitate , ensuring acknowledgments are not lost during transitions and maintaining replay buffer integrity across states. This coordination minimizes power while preserving the DLL's reliability guarantees, with L0s exit latencies reported in device capabilities (typically under 4 µs for modern links).

Transaction Layer

The Transaction Layer serves as the uppermost protocol layer in the PCI Express , handling the formation, , and management of end-to-end transactions between devices. It abstracts application-level communications into discrete units called Transaction Layer Packets (TLPs), which encapsulate requests and completions for operations such as data transfers and signaling. This layer interfaces with the below it, briefly referencing credit-based flow control mechanisms to manage TLP transmission without delving into delivery guarantees. By defining logical transaction semantics, the Transaction Layer enables scalable interconnects for diverse peripherals while maintaining compatibility with legacy PCI concepts. Transaction Layer Packets form the core of communication in PCI Express, consisting of a header (either 3 or 4 double-words, or DWs, where 1 DW equals 32 bits), an optional data payload ranging from 0 to 1024 DWs, and an optional end-to-end CRC (ECRC) field of 1 DW for integrity checking. The header includes fields for packet format, type, routing information, and attributes like ordering rules and poison bit for error indication. TLPs are categorized into four primary types to support varied operations: memory read and write for accessing memory-mapped spaces (with support for burst transfers and locked semantics in compatible implementations); I/O read and write for legacy port-mapped I/O, though increasingly deprecated in favor of memory-mapped alternatives; configuration read and write to probe and configure device registers within a 4 KB configuration space per function; and message requests, which are non-posted or posted writes used for signaling events, power management, or vendor-specific communications without requiring acknowledgments. Header formats distinguish between 3 DW (96 bits) for simpler packets without 64-bit addressing and 4 DW (128 bits) for those requiring extended addressing or additional attributes, with the first DW containing format, type, and fields to interpret the rest. For instance, a basic memory read TLP uses a 3 DW header with routing, specifying the starting and transfer up to 4 KB, while a configuration write might employ a 3 DW header with ID routing to target a specific bus-device-function. These formats ensure efficient serialization while accommodating the diverse needs of requestors and completers in a hierarchical . Virtual channels (VCs) enhance (QoS) by allowing multiple logical data streams to share a physical link, with up to eight VCs supported per link to prioritize such as isochronous audio/video over bulk data transfers. Each VC operates independently with its own buffer credits and scheme, mapped via traffic classes (TCs) during link configuration to prevent and ensure deterministic latency for time-sensitive applications. This mechanism, configured through control registers, enables flexible resource allocation without hardware reconfiguration. Routing in the Transaction Layer directs TLPs across the interconnect fabric using three mechanisms: address routing for memory and I/O transactions, which forwards packets based on the 32- or 64-bit address in the header toward or endpoint targets; ID routing for completions and configuration accesses, employing a 16-bit requester/completer ID (bus:device:function) to navigate the ; and implicit routing for certain message TLPs, determined by a 3-bit code in the header for or broadcast scenarios without explicit addressing. These methods support both upstream (endpoint to host) and downstream (host to endpoint) flows, with switches using internal tables to resolve paths efficiently. communication is facilitated implicitly in messages, allowing direct device-to-device transfers when enabled. Interrupt handling has evolved in PCI Express to leverage TLPs, replacing legacy INTx wired-OR signaling with scalable message-based interrupts. (MSI) transmit a single 32-bit and 16-bit vector as a memory write TLP, enabling multiple interrupt vectors per device through configurable values. MSI-X extends this with a dedicated table of up to 2048 / pairs per function, stored in BAR-mapped , allowing per-vector masking, affinity to CPU cores, and dynamic enablement without global broadcasts. These mechanisms reduce latency and wiring complexity in high-device-count systems.

Efficiency Mechanisms

PCI Express optimizes throughput and power consumption through several key mechanisms that address encoding overhead, error correction, signal integrity, and idle state management. These features ensure high effective bandwidth while maintaining reliability and efficiency across varying link conditions. Encoding schemes play a critical role in balancing data transmission reliability with bandwidth utilization. In PCIe generations 1.x and 2.x, the 8b/10b encoding maps 8 data bits to 10-bit symbols to facilitate clock recovery and DC balance, yielding an efficiency of 80%. This introduces a 20% overhead, reducing the effective bandwidth to 80% of the raw signaling rate; for instance, a PCIe 2.0 link at 5 GT/s per lane delivers approximately 4 GT/s of usable data per lane. Starting with PCIe 3.0, the 128b/130b encoding replaces this with a more efficient approach, appending only 2 synchronization bits to blocks of 128 data bits, achieving 98.46% efficiency. This minimizes overhead to about 1.54%, enabling higher effective throughput—such as doubling the data rate from PCIe 2.0 to 3.0 without increasing the raw bit rate proportionally—and supports sustained performance in bandwidth-intensive applications. To combat error rates at elevated signaling speeds, particularly with the shift to PAM4 modulation in PCIe 6.0, (FEC) employs Reed-Solomon codes integrated into the FLIT-based architecture. This lightweight, low-latency FEC corrects multiple symbol errors per block, targeting a pre-correction first bit error rate (FBER) of around 10610^{-6} while achieving a post-FEC (BER) below 101510^{-15}. By enabling error correction without frequent retransmissions, it preserves throughput and reduces latency overhead compared to retry-based methods, ensuring robust over longer channels or in noisy environments. Link equalization and margining further enhance efficiency by dynamically optimizing signal quality during initialization and operation. During link training, devices negotiate adaptive transmitter presets—such as de-emphasis, preshoot, and boost levels—along with receiver continuous-time linear equalization (CTLE) and decision feedback equalization (DFE) settings. These adjustments compensate for inter-symbol interference (ISI) and channel attenuation, selecting the optimal preset combination to maximize eye opening and minimize bit errors. This process reduces latency by avoiding marginal links that might require speed downgrades or retries, typically converging in microseconds while supporting seamless transitions across generations. Power efficiency is achieved via (ASPM), which allows links to enter lower-power states without full disconnection. In the L0 state, the link operates at full performance; L0s enables quick partial power-down of the receiver during short idles, while L1 and its substates (L1.1 and L1.2) reduce voltage swings, gate clocks, and lower reference voltages for deeper savings during prolonged inactivity. Power consumption in these states scales approximately with the formula Pn×V×IP \approx n \times V \times I, where nn is the number of lanes, VV is the supply voltage, and II is the current draw; in L1 substates, reductions in VV and II can yield up to 70-90% lower idle power per lane compared to L0, depending on implementation, thereby extending battery life in mobile systems and reducing thermal overhead in servers.

Advanced Features and Draft Processes

Single Root I/O Virtualization (SR-IOV) is a specification that enables a single physical PCIe device to present multiple virtual functions (VFs) to the host system, facilitating efficient resource partitioning for virtual machines (VMs). Each VF operates as an independent PCIe function with its own dedicated resources, including memory address spaces, interrupt vectors, and configuration spaces, allowing direct assignment to VMs without mediation for I/O operations. This partitioning reduces latency and overhead in virtualized environments by bypassing the virtual switch, while the physical function (PF) retains administrative control over VF allocation and management. is managed through PF registers that define VF limits, such as BAR sizes and queue depths, ensuring isolation and scalability for up to 256 VFs per device in compliant implementations. Multi-Root I/O Virtualization (MR-IOV) extends SR-IOV capabilities to multi-host topologies, allowing a single PCIe device to be shared across multiple root complexes or independent hosts. In MR-IOV, virtual functions can be dynamically assigned to different roots, with coordinated via a multi-root aware switch that enforces isolation between domains. This enables scenarios like blade servers or clustered systems where I/O resources, such as network adapters, are pooled and partitioned among VMs on separate hosts, improving utilization in environments. Access Control Services (ACS) provide essential mechanisms within PCIe topologies by enforcing granular control over Transaction Layer Packet (TLP) routing at switches and downstream ports. ACS capabilities include source validation, request redirection, completion redirection, and translation blocking, which prevent unauthorized direct communication between endpoints and mitigate risks like rogue DMA attacks in virtualized setups. For end-to-end data protection, the and Data (IDE) feature, introduced in PCIe 6.0 and enhanced in subsequent drafts, applies AES-GCM and to TLPs across the entire interconnect path, including through switches and retimers, ensuring , , and replay protection without significant degradation. Complementing IDE, the Trusted Execution Environment Device Interface Protocol (TDISP) establishes secure channels between hosts and devices via through a Trusted Manager (TSM) and Device Manager (DSM), supporting device and isolation of trusted device interfaces in scenarios. PCIe supports multi-protocol coexistence by leveraging its for higher-level standards, enabling seamless integration in heterogeneous systems. (CXL) operates over the PCIe , multiplexing CXL.io (PCIe-compatible I/O), CXL.cache, and CXL.memory protocols to provide cache-coherent memory access and accelerator support without requiring dedicated wiring. This allows PCIe devices and CXL-enabled components, such as memory expanders, to share links dynamically, with protocol switching managed via alternate protocol DLLPs to maintain . For interconnects, the Universal Interconnect Express (UCIe) standard incorporates PCIe and CXL protocols in its protocol layer, facilitating high-bandwidth, low-latency die-to-die communication in multi-die packages while supporting flit-based modes for efficient resource sharing among . UCIe's design ensures with PCIe ecosystems, allowing -based accelerators to utilize existing PCIe software stacks for I/O and memory operations. The governs specification development through a structured process involving technical workgroups that review Engineering Change Requests (ECRs) and drafts to ensure compatibility and innovation. Early-stage versions, denoted as 0.x (e.g., PCIe 8.0 v0.3 released in September 2025), undergo workgroup approval after initial reviews and are accessible exclusively to members via the PCI-SIG workspace for feedback and iteration. This member-only phase allows collaborative refinement before public release, with final specifications like PCIe 7.0 achieving broad adoption following rigorous testing; the process emphasizes a one-tier membership model to promote timely progress toward milestones, such as full PCIe 8.0 delivery by 2028.

Applications

Consumer and Graphics Uses

In consumer computing, PCI Express (PCIe) serves as the primary interface for connecting high-performance graphics processing units (GPUs) to motherboards in desktops and laptops, enabling seamless integration for everyday tasks like video playback and web browsing, while scaling to demanding applications. Desktop GPUs typically connect via PCIe x16 (e.g., PCIe 4.0 x16 or PCIe 5.0 x16), providing higher bandwidth (up to ~64 GB/s bidirectional for PCIe 4.0 x16). Laptop discrete GPUs usually use fewer lanes, commonly PCIe 4.0 x8 (or x4 in some cases), resulting in roughly half the bandwidth (~32 GB/s bidirectional for PCIe 4.0 x8). This difference arises from laptop constraints on power, space, and thermals. In practice, the PCIe bandwidth reduction rarely limits performance significantly for most applications, as other factors like GPU memory bandwidth dominate. The x16 slot configuration, which provides 16 lanes of high-speed data transfer, is the standard for installing discrete GPUs in desktop systems, offering up to 64 GB/s bidirectional bandwidth (32 GB/s per direction) in PCIe 4.0 implementations to support smooth rendering and frame rates without significant bottlenecks for most modern titles. Higher PCIe versions like 5.0 x16 provide approximately 64 GB/s per direction, reducing transfer penalties in CPU-GPU offloading scenarios such as shared memory access compared to PCIe 4.0, thereby improving performance in data-transfer intensive benchmarks. This setup is ubiquitous in gaming rigs and creative workstations, where GPUs handle ray tracing and AI-accelerated effects. External GPUs (eGPUs) extend this capability to laptops via enclosures, which tunnel PCIe signals over connections, typically limited to the equivalent of PCIe 3.0 x4 bandwidth—approximately 22-24 Gbps practical throughput after overhead. This creates bottlenecks for bandwidth-intensive GPUs, such as those in the 40 series, where data transfer rates cap at around 3-4 GB/s, resulting in 10-20% performance losses compared to internal x16 slots in scenarios like 4K gaming or . Manufacturers like Razer and produce compact enclosures supporting form factors like OCuLink for direct PCIe cabling, though remains dominant for consumer portability. For gaming and , PCIe facilitates features like Resizable BAR, a PCIe extension that allows the CPU direct access to the full GPU video RAM (VRAM) rather than 256 MB chunks, reducing latency and boosting frame rates by up to 12% in supported titles such as 2077. Enabled via settings on compatible hardware—like NVIDIA RTX 30 series GPUs paired with AMD Ryzen 5000 or 10th/11th-gen CPUs—this enhances efficiency in x16 slots for tasks including video editing in Adobe Premiere and real-time 3D modeling. Consumer peripherals further leverage lower-lane PCIe slots: x1 configurations suit sound cards like the Creative AE-7 for high-fidelity audio processing, while x4 slots accommodate network adapters such as 10GbE cards for faster home networking. These cards often support hot-plug functionality for USB expansions, allowing dynamic addition of ports without system restarts. Adoption of advanced PCIe versions has accelerated in consumer devices during the 2020s, with PCIe 4.0 becoming standard in desktops and mid-range laptops by 2020, driven by AMD's 3000 series and Intel's 11th-gen processors, enabling widespread use in new gaming PCs by 2022 for doubled bandwidth over PCIe 3.0. PCIe 5.0 began appearing in premium laptops in late 2024, supported by Intel's 14th-gen and later processors allocating x4 lanes for SSDs, enabling speeds up to 14 GB/s in models like the 2025 Strix series. This progression supports evolving consumer needs, from 8K video editing to VR gaming, without requiring full system overhauls. In automotive applications, PCIe interfaces high-speed sensors and systems in advanced driver-assistance systems (ADAS), as seen in 2025 vehicle platforms from manufacturers like Tesla and .

Storage and Enterprise Systems

Non-Volatile Memory Express (NVMe) is a scalable host controller interface protocol optimized for PCIe-based solid-state drives (SSDs), enabling efficient communication between the host and storage devices. It supports up to 64,000 I/O queue pairs, each capable of handling up to 64,000 commands, which allows for massive parallelism in command submission and completion. This design contrasts sharply with the (AHCI), which is limited to 32 ports and 32 commands per port, resulting in serial access and higher overhead for multi-threaded operations. NVMe's 64-byte command format includes all necessary data for operations like a 4 KB read directly in the command, minimizing memory-mapped I/O (MMIO) accesses to just two register writes per command cycle, compared to AHCI's 6-9 reads and writes. Consequently, NVMe achieves lower latency—around 2.8 microseconds for command processing versus AHCI's 6 microseconds—while supporting and multiple MSI-X interrupts for enhanced throughput in high-I/O workloads. In enterprise storage environments, NVMe SSDs commonly adopt the (formerly SFF-8639) and U.3 (SFF-TA-1001) form factors, which are 2.5-inch standards designed for hot-pluggable, high-density deployments in servers and data centers. The interface supports up to four PCIe lanes alongside SAS/ compatibility, while U.3 extends this with a unified connector for PCIe, SAS, and , ensuring and simplified wiring. These form factors enable PCIe 4.0 x4 configurations, delivering effective bandwidth exceeding 7 GB/s per device after accounting for 128/130 encoding overhead on 16 GT/s signaling. For instance, enterprise NVMe SSDs in PCIe 4.0 setups routinely achieve sequential read/write speeds of 7 GB/s or more, supporting the intensive I/O demands of and database applications without the bottlenecks of legacy interfaces. RAID configurations in enterprise storage leverage Host Bus Adapters (HBAs) that integrate PCIe switches to manage multi-drive arrays efficiently. These HBAs, such as Microchip's SmartHBA series, use embedded PCIe switches like the SmartIOC 2200 to provide direct-path I/O, enabling low-latency connectivity to up to 16 or more NVMe/SAS/SATA drives per adapter. The switches expand a single PCIe host interface (e.g., x8 or x16) into multiple downstream ports, facilitating levels 0, 1, 5, and 10 across arrays while minimizing latency through tri-mode support for NVMe, SAS-4, and . In large-scale setups, this allows seamless scaling to dozens of drives, as seen in Broadcom's 94xx series HBAs, which handle enterprise with PCIe Gen4 bandwidth for sustained performance in storage enclosures. Server adoption of PCIe in enterprise storage has advanced with dual-socket systems utilizing PCIe 5.0 bifurcation to enable flexible storage pooling. In platforms like 's Server D40AMP family, dual processors provide up to 128 PCIe 5.0 lanes total, configurable via to split x16 slots into x8x8, x8x4x4, or x4x4x4x4 configurations, allowing direct attachment of multiple x4 NVMe SSDs for pooled resources. This bifurcation, managed through Intel Volume Management Device (VMD) 2.0, supports pooling of up to 24 or 32 E1.L NVMe drives per chassis, optimizing shared storage in virtualized environments without dedicated controllers. Such setups deliver aggregate bandwidth exceeding 60 GB/s for pooled I/O, enhancing in hyperscale data centers.

High-Performance and Cluster Interconnects

In (HPC) and environments, PCI Express (PCIe) serves as a foundational interconnect for scaling computational resources across multiple nodes, enabling efficient data transfer between processors, accelerators, and memory subsystems. By leveraging PCIe fabrics—networks of switches and links—systems can extend connectivity beyond single nodes, supporting workloads that demand massive parallelism, such as scientific simulations and large-scale data analytics. This approach contrasts with traditional bus architectures by providing scalable bandwidth and low-latency paths, crucial for maintaining performance in distributed setups. Cluster interconnects utilizing PCIe over fabric allow GPU clusters to share resources dynamically, treating the fabric as both intra-node I/O and inter-node communication pathways. In such configurations, PCIe switches enable direct data movement between GPUs across nodes, reducing bottlenecks in resource-intensive tasks. Complementing this, (CXL), built on the PCIe , introduces RDMA-like features that facilitate kernel-bypass data transfers, pinning user process pages for direct access without CPU mediation, akin to InfiniBand's capabilities. These features enhance efficiency in fabric-based clusters by supporting cache-coherent sharing and minimizing overhead in multi-node environments. For AI and machine learning acceleration, PCIe enables nodes with multiple x16 GPUs, where each accelerator connects via full-bandwidth links to maximize data throughput during training and inference. Systems often deploy 8 or more GPUs per node, balanced across PCIe topologies to ensure even distribution of lanes and avoid contention, supporting aggregate bandwidths up to hundreds of GB/s for parallel model processing. PCIe 6.0, with its 64 GT/s per lane, supports emerging 2025 systems, doubling PCIe 5.0's capacity to handle the escalating data demands of petascale AI models in supercomputing clusters. PCIe bifurcation is a technique that splits a single PCIe slot, such as an x16 slot, into multiple smaller links, for example x8/x8, to support dual GPUs with near-full bandwidth allocation to each device. This approach reduces bottlenecks in multi-GPU servers for AI tasks like model inference and training by minimizing lane contention and optimizing resource utilization in dense configurations. In multi-GPU configurations for large language model inference, PCIe 4.0 x4 limits bidirectional bandwidth to approximately 8 GB/s theoretical (6-7 GB/s practical), which can bottleneck inter-GPU data transfers required for techniques such as layer splitting, pipeline parallelism, or tensor parallelism, often resulting in suboptimal scaling compared to single-GPU performance. Disaggregated computing further leverages PCIe and CXL for memory pooling, allowing hyperscalers to allocate resources across nodes dynamically and reduce latency in resource-constrained workloads. CXL's protocol enables coherent access to pooled via PCIe , eliminating redundant copies and enabling elastic scaling for AI-driven applications in environments. This pooling model supports tiered memory hierarchies, where distant pools provide overflow capacity with latencies under 100 ns for local-like access, optimizing utilization in large-scale centers. A prominent case study is NVIDIA's DGX systems, where 8 GPUs are interconnected via and NVSwitch for high-bandwidth GPU-to-GPU communication (up to 900 GB/s bidirectional), with PCIe Gen5 x16 links connecting each GPU to the CPUs. This architecture achieves high aggregate bandwidth for distributed AI workloads, powering exascale-level computations by combining local fabrics with external networking for cluster-wide operations. In edge AI applications, compact PCIe-based accelerators like Intel's Habana Gaudi cards enable efficient in devices such as autonomous drones and smart cameras as of 2025.

Competing Protocols

Direct Alternatives

USB4 and Thunderbolt represent the primary modern direct alternatives to PCI Express for high-speed peripheral expansion in personal computers and servers, offering external connectivity options that compete in bandwidth while prioritizing user convenience. provides up to 40 Gbps of bidirectional bandwidth using the connector, enabling seamless integration with a wide range of devices without requiring specialized slots. Version 2.0, specified in 2022 and seeing initial device adoption as of 2025, supports up to 80 Gbps symmetric or 120 Gbps asymmetric bandwidth, further closing the gap with higher-speed internal PCIe configurations. 4 matches this 40 Gbps speed but adds certified support for PCIe tunneling, allowing external enclosures to leverage up to 32 Gbps of PCIe 3.0 bandwidth for storage or GPU acceleration, though with protocol overhead that falls short of native internal PCIe performance. 5, introduced in 2023 and gaining adoption by 2025, doubles the baseline to 80 Gbps bidirectional and up to 120 Gbps with Bandwidth Boost for asymmetric workloads like , while supporting PCIe 4.0 at 64 Gbps. Both standards emphasize ease of use through hot-swappable, plug-and-play connections via a single cable, contrasting with PCIe’s requirement for internal slot installation and system reboot. Additionally, they provide robust power delivery up to 100 W, enabling charging of laptops or powering peripherals directly over the cable, an advantage over standard PCIe which relies on separate power rails. Older standards like and AGP served as predecessors to PCIe but have been largely supplanted due to architectural limitations. , a parallel bus extension of the original PCI, operated at clock speeds from 66 MHz to 533 MHz in 64-bit mode, delivering maximum bandwidths of 1.06 GB/s at 133 MHz up to approximately 4.3 GB/s in its 533 MHz half-duplex configuration, with rare 1066 MHz double-data-rate variants approaching 8.5 GB/s. This parallel design suffered from issues at higher speeds and shared bandwidth among devices, making it unsuitable for modern scalable expansions. AGP, specifically tailored for graphics accelerators, provided dedicated point-to-point bandwidth up to 2.1 GB/s in its 8x version at 533 MHz, accelerating by allowing without competing with other peripherals on the PCI bus. However, AGP's single-purpose focus limited its versatility, and it was progressively phased out starting in 2004 as PCIe offered greater flexibility and higher speeds for graphics and general use. Key trade-offs between PCIe and these alternatives center on , , and . PCIe achieves sub-microsecond end-to-end latency for transfers, ideal for real-time applications like computing clusters, whereas and introduce additional latency due to protocol encapsulation and , though this remains negligible for most tasks. USB's plug-and-play simplicity allows instant device swapping without opening the , a major convenience over PCIe’s fixed internal connections, but at the cost of lower peak efficiency for sustained high-throughput workloads. High-lane-count PCIe configurations, such as x16 or x32 for GPUs or NVMe arrays, incur higher costs due to complex routing and chipsets—often 20–50% more expensive than equivalent hubs—while delivering scalable bandwidth up to 64 GB/s in PCIe 5.0 x16 setups without external cabling limitations. As of 2025, PCIe maintains dominance in internal expansions for PCs and servers, powering the majority of add-in cards like GPUs and storage controllers due to its low-latency, high-bandwidth within . In contrast, and capture the external connectivity market, leveraging universal compatibility over PCIe’s ecosystem lock-in. This division underscores PCIe’s role in core system performance versus USB/Thunderbolt’s emphasis on accessible, versatile externals.

Complementary Standards

Compute Express Link (CXL) is a complementary standard that builds directly on the PCI Express (PCIe) to enable cache-coherent interconnects for processors, memory expansion, and accelerators. It leverages PCIe 5.0 and 6.0 for high-bandwidth, low-latency connections while adding protocols for coherency, allowing CPUs to access and share device-attached seamlessly. CXL defines three device types: Type 1 devices, which provide acceleration without integrated or caching; Type 2 devices, which include both and caching capabilities for coherent sharing; and Type 3 devices, focused on memory expansion to pool resources across systems. The latest CXL 3.2 specification, released in 2024, introduces enhancements for monitoring, management, security (including Trusted Security Protocol), and with earlier versions. Universal Chiplet Interconnect Express (UCIe) extends PCIe principles to the die-to-die level, serving as a standardized interconnect for multi-chip modules in advanced packaging. By adapting PCIe and CXL standards, UCIe defines the , protocols, and software stack for chiplet-based system-on-chip (SoC) designs, enabling interoperability across vendors. Key versions include UCIe 1.0 for basic die-to-die I/O, UCIe 1.1 with automotive reliability features and compliance testing, UCIe 2.0 supporting 3D packaging at bump pitches from 1 to 25 microns, and UCIe 3.0 offering data rates up to 64 GT/s for higher bandwidth and efficiency. This allows modular construction of complex SoCs, overcoming size limits and reducing design costs through customizable, scalable architectures. PCIe fabrics integrate with storage protocols like NVMe over Fabrics (NVMe-oF), which extends the NVMe command set—originally optimized for direct PCIe attachment—across networked fabrics while preserving low-latency performance. In PCIe-based implementations, NVMe-oF uses message-based queueing and scatter-gather lists for data transfers, adapting from PCIe’s memory-mapped model to support scalable, disaggregated storage pools with minimal added latency (under 10 µs). For system management, the DMTF standard provides RESTful APIs to handle PCIe and CXL resources, including a dedicated CXL-to- mapping for device discovery, , and monitoring. Collaboration between and DMTF enables -based objects transported over PCIe via (MCTP) or configuration space mailboxes, simplifying security in multi-device environments. These standards extend PCIe to disaggregated computing architectures by enabling resource pooling—such as and accelerators—without replacing the core PCIe infrastructure, resulting in lower latency, reduced power consumption, and improved for AI and workloads. For instance, CXL and facilitate efficient data movement in pooled systems, supporting electrical and optical links for extended reach while maintaining PCIe’s low-power modes.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.