Hubbry Logo
Peripheral Component InterconnectPeripheral Component InterconnectMain
Open search
Peripheral Component Interconnect
Community hub
Peripheral Component Interconnect
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Peripheral Component Interconnect
Peripheral Component Interconnect
from Wikipedia

PCI
PCI Local Bus
Three 5-volt 32-bit PCI expansion slots on a motherboard (PC bracket on left side)
Year createdJune 22, 1992; 33 years ago (1992-06-22)[1]
Created byIntel
SupersedesISA, EISA, MCA, VLB
Superseded byAGP for graphics (1997), PCI Express (2004)
Width in bits32 or 64
SpeedHalf-duplex:[2]
133 MB/s (32-bit at 33 MHz – the standard configuration)
266 MB/s (32-bit at 66 MHz)
266 MB/s (64-bit at 33 MHz)
533 MB/s (64-bit at 66 MHz)
StyleParallel
Hotplugging interfaceOptional
Websitewww.pcisig.com/home

Peripheral Component Interconnect (PCI)[3] is a local computer bus for attaching hardware devices in a computer and is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus but in a standardized format that is independent of any given processor's native bus. Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space.[4] It is a parallel bus, synchronous to a single bus clock. Attached devices can take either the form of an integrated circuit fitted onto the motherboard (called a planar device in the PCI specification) or an expansion card that fits into a slot. The PCI Local Bus was first implemented in IBM PC compatibles, where it displaced the combination of several slow Industry Standard Architecture (ISA) slots and one fast VESA Local Bus (VLB) slot as the bus configuration. It has subsequently been adopted for other computer types. Typical PCI cards used in PCs include: network cards, sound cards, modems, extra ports such as Universal Serial Bus (USB) or serial, TV tuner cards and hard disk drive host adapters. PCI video cards replaced ISA and VLB cards until rising bandwidth needs outgrew the abilities of PCI. The preferred interface for video cards then became Accelerated Graphics Port (AGP), a superset of PCI, before giving way to PCI Express.[5]

The first version of PCI found in retail desktop computers was a 32-bit bus using a 33 MHz bus clock and V signaling, although the PCI 1.0 standard provided for a 64-bit variant as well.[6] These have one locating notch in the card. Version 2.0 of the PCI standard introduced 3.3 V slots, physically distinguished by a flipped physical connector to prevent accidental insertion of 5 V cards. Universal cards, which can operate on either voltage, have two notches. Version 2.1 of the PCI standard introduced optional 66 MHz operation. A server-oriented variant of PCI, PCI Extended (PCI-X) operated at frequencies up to 133 MHz for PCI-X 1.0 and up to 533 MHz for PCI-X 2.0. An internal connector for laptop cards, called Mini PCI, was introduced in version 2.2 of the PCI specification. The PCI bus was also adopted for an external laptop connector standard – the CardBus.[7] The first PCI specification was developed by Intel, but subsequent development of the standard became the responsibility of the PCI Special Interest Group (PCI-SIG).[8]

PCI and PCI-X sometimes are referred to as either Parallel PCI or Conventional PCI[9] to distinguish them technologically from their more recent successor PCI Express, which adopted a serial, lane-based architecture.[10][11] PCI's heyday in the desktop computer market was approximately 1995 to 2005.[10] PCI and PCI-X have become obsolete for most purposes and has largely disappeared from many other modern motherboards since 2013; however they are still common on some modern desktops as of 2020 for the purposes of backward compatibility and the relative low cost to produce. Another common modern application of parallel PCI is in industrial PCs, where many specialized expansion cards, used here, never transitioned to PCI Express, just as with some ISA cards. Many kinds of devices formerly available on PCI expansion cards are now commonly integrated onto motherboards or available in USB and PCI Express versions.

History

[edit]
A typical 32-bit, 5 V-only PCI card, in this case, a SCSI adapter from Adaptec
A motherboard with two 32-bit PCI slots and two sizes of PCI Express slots

Work on PCI began at the Intel Architecture Labs (IAL, also Architecture Development Lab) c. 1990. A team of primarily IAL engineers defined the architecture and developed a proof of concept chipset and platform (Saturn) partnering with teams in the company's desktop PC systems and core logic product organizations.

PCI was immediately put to use in servers, replacing Micro Channel architecture (MCA) and Extended Industry Standard Architecture (EISA) as the server expansion bus of choice. In mainstream PCs, PCI was slower to replace VLB, and did not gain significant market penetration until late 1994 in second-generation Pentium PCs. By 1996, VLB was all but extinct, and manufacturers had adopted PCI even for Intel 80486 (486) computers.[12] EISA continued to be used alongside PCI through 2000. Apple Computer adopted PCI for professional Power Macintosh computers (replacing NuBus) in mid-1995, and the consumer Performa product line (replacing LC Processor Direct Slot (PDS)) in mid-1996.

Outside the server market, the 64-bit version of plain PCI remained rare in practice though,[13] although it was used for example by all (post-iMac) G3 and G4 Power Macintosh computers.[14]

Later revisions of PCI added new features and performance improvements, including a 66 MHz 3.3 V standard and 133 MHz PCI-X, and the adaptation of PCI signaling to other form factors. Both PCI-X 1.0b and PCI-X 2.0 are backward compatible with some PCI standards. These revisions were used on server hardware but consumer PC hardware remained nearly all 32-bit, 33 MHz and 5 volt.

The PCI-SIG introduced the serial PCI Express in c. 2004. Since then, motherboard manufacturers gradually included fewer or zero PCI slots in favor of the new standard. Bridge adapters allow the use of legacy PCI cards with PCI Express motherboards.

PCI history[15]
Spec Year Change summary[16]
PCI 1.0 1992 Original issue
PCI 2.0 1993 Incorporated connector and add-in card specification
PCI 2.1 1995 Incorporated clarifications and added 66 MHz chapter
PCI 2.2 1998 Incorporated ECNs, and improved readability
PCI 2.3 2002 Incorporated ECNs, errata, and deleted 5 volt only keyed add-in cards
PCI 3.0 2004 Removed support for 5.0 volt keyed system board connector

Auto configuration

[edit]

PCI provides separate memory and memory-mapped I/O port address spaces for the x86 processor family, 64 and 32 bits, respectively. Addresses in these address spaces are assigned by software. A third address space, called the PCI Configuration Space, which uses a fixed addressing scheme, allows software to determine the amount of memory and I/O address space needed by each device. Each device can request up to six areas of memory space or input/output (I/O) port space via its configuration space registers.

In a typical system, the firmware (or operating system) queries all PCI buses at startup time (via PCI Configuration Space) to find out what devices are present and what system resources (memory space, I/O space, interrupt lines, etc.) each needs. It then allocates the resources and tells each device what its allocation is.

The PCI configuration space also contains a small amount of device type information, which helps an operating system choose device drivers for it, or at least to have a dialogue with a user about the system configuration.

Devices may have an on-board read-only memory (ROM) containing executable code for x86 or PA-RISC processors, an Open Firmware driver, or an Option ROM. These are typically needed for devices used during system startup, before device drivers are loaded by the operating system.

In addition, there are PCI Latency Timers that are a mechanism for PCI Bus-Mastering devices to share the PCI bus fairly. "Fair" in this case means that devices will not use such a large portion of the available PCI bus bandwidth that other devices are not able to get needed work done. Note, this does not apply to PCI Express.

How this works is that each PCI device that can operate in bus-master mode is required to implement a timer, called the Latency Timer, that limits the time that device can hold the PCI bus. The timer starts when the device gains bus ownership, and counts down at the rate of the PCI clock. When the counter reaches zero, the device is required to release the bus. If no other devices are waiting for bus ownership, it may simply grab the bus again and transfer more data.[17]

Interrupts

[edit]

Devices are required to follow a protocol so that the interrupt-request (IRQ) lines can be shared. The PCI bus includes four interrupt lines, INTA# through INTD#, all of which are available to each device. Up to eight PCI devices share the same IRQ line (INTINA# through INTINH#) in APIC-enabled x86 systems. Interrupt lines are not wired in parallel as are the other PCI bus lines. The positions of the interrupt lines rotate between slots, so what appears to one device as the INTA# line is INTB# to the next and INTC# to the one after that. Single-function devices usually use their INTA# for interrupt signaling, so the device load is spread fairly evenly across the four available interrupt lines. This alleviates a common problem with sharing interrupts.

The mapping of PCI interrupt lines onto system interrupt lines, through the PCI host bridge, is implementation-dependent. Platform-specific firmware or operating system code is meant to know this, and set the "interrupt line" field in each device's configuration space indicating which IRQ it is connected to.

PCI interrupt lines are level-triggered. This was chosen over edge-triggering to gain an advantage when servicing a shared interrupt line, and for robustness: edge-triggered interrupts are easy to miss.

Later revisions of the PCI specification add support for message-signaled interrupts. In this system, a device signals its need for service by performing a memory write, rather than by asserting a dedicated line. This alleviates the problem of scarcity of interrupt lines. Even if interrupt vectors are still shared, it does not suffer the sharing problems of level-triggered interrupts. It also resolves the routing problem, because the memory write is not unpredictably modified between device and host. Finally, because the message signaling is in-band, it resolves some synchronization problems that can occur with posted writes and out-of-band interrupt lines.

PCI Express does not have physical interrupt lines at all. It uses message-signaled interrupts exclusively.

Conventional hardware specifications

[edit]
Diagram showing the different key positions for 32-bit and 64-bit PCI cards

These specifications represent the most common version of PCI used in normal PCs:

  • 33.33 MHz clock with synchronous transfers
  • Peak transfer rate of 133 MB/s (133 megabytes per second) for 32-bit bus width (33.33 MHz × 32 bits ÷ 8 bits/byte = 133 MB/s)
  • 32-bit bus width
  • 32- or 64-bit memory address space (4 GiB or 16 EiB)
  • 32-bit I/O port space
  • 256-byte (per device) configuration space
  • 5-volt signaling
  • Reflected-wave switching

The PCI specification also provides options for 3.3 V signaling, 64-bit bus width, and 66 MHz clocking, but these are not commonly encountered outside of PCI-X support on server motherboards.

The PCI bus arbiter performs bus arbitration among multiple masters on the PCI bus. Any number of bus masters can reside on the PCI bus, as well as requests for the bus. One pair of request and grant signals is dedicated to each bus master.

Card voltage and keying

[edit]
A PCI-X Gigabit Ethernet expansion card with both 5 V and 3.3 V support notches, side B toward the camera

Typical PCI cards have either one or two key notches, depending on their signaling voltage. Cards requiring 3.3 volts have a notch 56.21 mm from the card backplate in pin positions 12 and 13; those requiring 5 volts have a notch 104.41 mm from the backplate in pin positions 50 and 51. This allows cards to be fitted only into slots with a voltage they support. "Universal cards" accepting either voltage have both key notches.

Connector pinout

[edit]

The PCI connector is defined as having 62 contacts on each side of the edge connector, but two or four of them are replaced by key notches, so a card has 60 or 58 contacts on each side. Side A refers to the 'solder side' and side B refers to the 'component side': if the card is held with the connector pointing down, a view of side A will have the backplate on the right, whereas a view of side B will have the backplate on the left. The pinout of B and A sides are as follows, looking down into the motherboard connector (pins A1 and B1 are closest to backplate).[16][18][19]

32-bit PCI connector pinout
Pin Side B Side A Comments
1 −12 V TRST# JTAG port pins (optional)
2 TCK +12 V
3 Ground TMS
4 TDO TDI
5 +5 V +5 V
6 +5 V INTA# Interrupt pins (open-drain)
7 INTB# INTC#
8 INTD# +5 V
9 PRSNT1# Reserved Pulled low to indicate 7.5 or 25 W power required
10 Reserved IOPWR +5 V or +3.3 V
11 PRSNT2# Reserved Pulled low to indicate 7.5 or 15 W power required
12 Ground Ground Key notch for 3.3 V-capable cards
13 Ground Ground
14 Reserved 3.3 V aux Standby power (optional)
15 Ground RST# Bus reset
16 CLK IOPWR 33/66 MHz clock
17 Ground GNT# Bus grant from motherboard to card
18 REQ# Ground Bus request from card to motherboard
19 IOPWR PME# Power management event (optional) 3.3 V, open drain, active low.[20]
20 AD[31] AD[30] Address/data bus (upper half)
21 AD[29] +3.3 V
22 Ground AD[28]
23 AD[27] AD[26]
24 AD[25] Ground
25 +3.3 V AD[24]
26 C/BE[3]# IDSEL
27 AD[23] +3.3 V
28 Ground AD[22]
29 AD[21] AD[20]
30 AD[19] Ground
31 +3.3 V AD[18]
32 AD[17] AD[16]
33 C/BE[2]# +3.3 V
34 Ground FRAME# Bus transfer in progress
35 IRDY# Ground Initiator ready
36 +3.3 V TRDY# Target ready
37 DEVSEL# Ground Target selected
38 PCIXCAP Ground STOP# PCI-X capable; Target requests halt
39 LOCK# +3.3 V Locked transaction
40 PERR# SMBCLK SDONE Parity error; SMBus clock or Snoop done (obsolete)
41 +3.3 V SMBDAT SBO# SMBus data or Snoop backoff (obsolete)
42 SERR# Ground System error
43 +3.3 V PAR Even parity over AD[31:00] and C/BE[3:0]#
44 C/BE[1]# AD[15] Address/data bus (higher half)
45 AD[14] +3.3 V
46 Ground AD[13]
47 AD[12] AD[11]
48 AD[10] Ground
49 M66EN Ground AD[09]
50 Ground Ground Key notch for 5 V-capable cards
51 Ground Ground
52 AD[08] C/BE[0]# Address/data bus (lower half)
53 AD[07] +3.3 V
54 +3.3 V AD[06]
55 AD[05] AD[04]
56 AD[03] Ground
57 Ground AD[02]
58 AD[01] AD[00]
59 IOPWR IOPWR
60 ACK64# REQ64# For 64-bit extension; no connect for 32-bit devices.
61 +5 V +5 V
62 +5 V +5 V

64-bit PCI extends this by an additional 32 contacts on each side which provide AD[63:32], C/BE[7:4]#, the PAR64 parity signal, and a number of power and ground pins.

Legend
Ground pin Zero volt reference
Power pin Supplies power to the PCI card
Output pin Driven by the PCI card, received by the motherboard
Initiator output Driven by the master/initiator, received by the target
I/O signal May be driven by initiator or target, depending on operation
Target output Driven by the target, received by the initiator/master
Input Driven by the motherboard, received by the PCI card
Open drain May be pulled low and/or sensed by multiple cards
Reserved Not presently used, do not connect

Most lines are connected to each slot in parallel. The exceptions are:

  • Each slot has its own REQ# output to, and GNT# input from the motherboard arbiter.
  • Each slot has its own IDSEL line, usually connected to a specific AD line.
  • TDO is daisy-chained to the following slot's TDI. Cards without JTAG support must connect TDI to TDO so as not to break the chain.
  • PRSNT1# and PRSNT2# for each slot have their own pull-up resistors on the motherboard. The motherboard may (but does not have to) sense these pins to determine the presence of PCI cards and their power requirements.
  • REQ64# and ACK64# are individually pulled up on 32-bit only slots.
  • The interrupt pins INTA# through INTD# are connected to all slots in different orders. (INTA# on one slot is INTB# on the next and INTC# on the one after that.)

Notes:

  • IOPWR is +3.3 V or +5 V, depending on the backplane. The slots also have a ridge in one of two places which prevents insertion of cards that do not have the corresponding key notch, indicating support for that voltage standard. Universal cards have both key notches and use IOPWR to determine their I/O signal levels.
  • The PCI SIG strongly encourages 3.3 V PCI signaling,[16] requiring support for it since standard revision 2.3,[18] but most PC motherboards use the 5 V variant. Thus, while many currently available PCI cards support both, and have two key notches to indicate that, there are still a large number of 5 V-only cards on the market.[needs update]
  • The M66EN pin is an additional ground on 5 V PCI buses found in most PC motherboards. Cards and motherboards that do not support 66 MHz operation also ground this pin. If all participants support 66 MHz operation, a pull-up resistor on the motherboard raises this signal high and 66 MHz operation is enabled. The pin is still connected to ground via coupling capacitors on each card to preserve its AC shielding function.[needs update]
  • The PCIXCAP pin is an additional ground on PCI buses and cards. If all cards and the motherboard support the PCI-X protocol, a pull-up resistor on the motherboard raises this signal high and PCI-X operation is enabled. The pin is still connected to ground via coupling capacitors on each card to preserve its AC shielding function.
  • At least one of PRSNT1# and PRSNT2# must be grounded by the card. The combination chosen indicates the total power requirements of the card (25 W, 15 W, or 7.5 W).
  • SBO# and SDONE are signals from a cache controller to the current target. They are not initiator outputs, but are colored that way because they are target inputs.
  • PME# (19 A) – Power management event (optional) which is supported in PCI version 2.2 and higher. It is a 3.3 V, open drain, active low signal.[20] PCI cards may use this signal to send and receive PME via the PCI socket directly, which eliminates the need for a special Wake-on-LAN cable.[21]

Mixing of 32-bit and 64-bit PCI cards in different width slots

[edit]
A semi-inserted PCI-X card in a 32-bit PCI slot, illustrating the need for the rightmost notch and the extra room on the motherboard to remain backward compatible
64-bit SCSI card working in a 32-bit PCI slot

Most 32-bit PCI cards will function properly in 64-bit PCI-X slots, but the bus clock rate will be limited to the clock frequency of the slowest card, an inherent limitation of PCI's shared bus topology. For example, when a PCI 2.3, 66-MHz peripheral is installed into a PCI-X bus capable of 133 MHz, the entire bus backplane will be limited to 66 MHz. To get around this limitation, many motherboards have two or more PCI/PCI-X buses, with one bus intended for use with high-speed PCI-X peripherals, and the other bus intended for general-purpose peripherals.

Many 64-bit PCI-X cards are designed to work in 32-bit mode if inserted in shorter 32-bit connectors, with some loss of performance.[22][23] An example of this is the Adaptec 29160 64-bit SCSI interface card.[24] However, some 64-bit PCI-X cards do not work in standard 32-bit PCI slots.[25][unreliable source?]

Installing a 64-bit PCI-X card in a 32-bit slot will leave the 64-bit portion of the card edge connector not connected and overhanging. This requires that there be no motherboard components positioned so as to mechanically obstruct the overhanging portion of the card edge connector.

Physical dimensions

[edit]

PCI brackets heights:

  • Standard: 120.02 mm;[26]
  • Low Profile: 79.20 mm.[27]

PCI Card lengths (Standard Bracket & 3.3 V):[28]

  • Short Card: 169.52 mm;
  • Long Card: 313.78 mm.

PCI Card lengths (Low Profile Bracket & 3.3 V):[29]

  • MD1: 121.79 mm;
  • MD2: 169.52 mm;
  • MD3: 243.18 mm.

Mini PCI

[edit]
A Mini PCI slot
Mini PCI Wi-Fi card Type IIIB
PCI-to-MiniPCI converter Type III
MiniPCI and MiniPCI Express cards in comparison

Mini PCI was added to PCI version 2.2 for use in laptops and some routers;[citation needed] it uses a 32-bit, 33 MHz bus with powered connections (3.3 V only; 5 V is limited to 100 mA) and support for bus mastering and DMA. The standard size for Mini PCI cards is approximately a quarter of their full-sized counterparts. There is no access to the card from outside the case, unlike desktop PCI cards with brackets carrying connectors. This limits the kinds of functions a Mini PCI card can perform.

Many Mini PCI devices were developed such as Wi-Fi, Fast Ethernet, Bluetooth, modems (often Winmodems), sound cards, cryptographic accelerators, SCSI, IDEATA, SATA controllers and combination cards. Mini PCI cards can be used with regular PCI-equipped hardware, using Mini PCI-to-PCI converters. Mini PCI has been superseded by the much narrower PCI Express Mini Card

Technical details of Mini PCI

[edit]

Mini PCI cards have a 2 W maximum power consumption, which limits the functionality that can be implemented in this form factor. They also are required to support the CLKRUN# PCI signal used to start and stop the PCI clock for power management purposes.

There are three card form factors: Type I, Type II, and Type III cards. The card connector used for each type include: Type I and II use a 100-pin stacking connector, while Type III uses a 124-pin edge connector, i.e. the connector for Types I and II differs from that for Type III, where the connector is on the edge of a card, like with a SO-DIMM. The additional 24 pins provide the extra signals required to route I/O back through the system connector (audio, AC-Link, LAN, phone-line interface). Type II cards have RJ11 and RJ45 mounted connectors. These cards must be located at the edge of the computer or docking station so that the RJ11 and RJ45 ports can be mounted for external access.

Type Card on
outer edge of
host system
Connector Size
(mm × mm × mm)
comments
IA No 100-pin
stacking
7.5 × 70 × 45 Large Z dimension (7.5 mm)
IB 5.5 × 70 × 45 Smaller Z dimension (5.5 mm)
IIA Yes 17.44 × 70 × 45 Large Z dimension (17.44 mm)
IIB 5.5 × 78 × 45 Smaller Z dimension (5.5 mm)
IIIA No 124-pin
card edge
2.4 × 59.6 × 50.95 Larger Y dimension (50.95 mm)
IIIB 2.4 × 59.6 × 44.6 Smaller Y dimension (44.6 mm)

Mini PCI is distinct from 144-pin Micro PCI.[30]

PCI bus transactions

[edit]

PCI bus traffic consists of a series of PCI bus transactions. Each transaction consists of an address phase followed by one or more data phases. The direction of the data phases may be from initiator to target (write transaction) or vice versa (read transaction), but all of the data phases must be in the same direction. Either party may pause or halt the data phases at any point. (One common example is a low-performance PCI device that does not support burst transactions, and always halts a transaction after the first data phase.)

Any PCI device may initiate a transaction. First, it must request permission from a PCI bus arbiter on the motherboard. The arbiter grants permission to one of the requesting devices. The initiator begins the address phase by broadcasting a 32-bit address plus a 4-bit command code, then waits for a target to respond. All other devices examine this address and one of them responds a few cycles later.

64-bit addressing is done using a two-stage address phase. The initiator broadcasts the low 32 address bits, accompanied by a special "dual address cycle" command code. Devices that do not support 64-bit addressing can simply not respond to that command code. The next cycle, the initiator transmits the high 32 address bits, plus the real command code. The transaction operates identically from that point on. To ensure compatibility with 32-bit PCI devices, it is forbidden to use a dual address cycle if not necessary, i.e. if the high-order address bits are all zero.

While the PCI bus transfers 32 bits per data phase, the initiator transmits 4 active-low byte enable signals indicating which 8-bit bytes are to be considered significant. In particular, a write must affect only the enabled bytes in the target PCI device. They are of little importance for memory reads, but I/O reads might have side effects. The PCI standard explicitly allows a data phase with no bytes enabled, which must behave as a no-op.

PCI address spaces

[edit]

PCI has three address spaces: memory, I/O address, and configuration.

Memory addresses are 32 bits (optionally 64 bits) in size, support caching and can be burst transactions.

I/O addresses are for compatibility with the Intel x86 architecture's I/O port address space. Although the PCI bus specification allows burst transactions in any address space, most devices only support it for memory addresses and not I/O.

Finally, PCI configuration space provides access to 256 bytes of special configuration registers per PCI device. Each PCI slot gets its own configuration space address range. The registers are used to configure devices memory and I/O address ranges they should respond to from transaction initiators. When a computer is first turned on, all PCI devices respond only to their configuration space accesses. The computer's BIOS scans for devices and assigns Memory and I/O address ranges to them.

If an address is not claimed by any device, the transaction initiator's address phase will time out causing the initiator to abort the operation. In case of reads, it is customary to supply all-ones for the read data value (0xFFFFFFFF) in this case. PCI devices therefore generally attempt to avoid using the all-ones value in important status registers, so that such an error can be easily detected by software.

PCI command codes

[edit]

There are 16 possible 4-bit command codes, and 12 of them are assigned. With the exception of the unique dual address cycle, the least significant bit of the command code indicates whether the following data phases are a read (data sent from target to initiator) or a write (data sent from an initiator to target). PCI targets must examine the command code as well as the address and not respond to address phases that specify an unsupported command code.

The commands that refer to cache lines depend on the PCI configuration space cache line size register being set up properly; they may not be used until that has been done.

0000: Interrupt Acknowledge
This is a special form of read cycle implicitly addressed to the interrupt controller, which returns an interrupt vector. The 32-bit address field is ignored. One possible implementation is to generate an interrupt acknowledge cycle on an ISA bus using a PCI/ISA bus bridge. This command is for IBM PC compatibility; if there is no Intel 8259 style interrupt controller on the PCI bus, this cycle need never be used.
0001: Special Cycle
This cycle is a special broadcast write of system events that PCI card may be interested in. The address field of a special cycle is ignored, but it is followed by a data phase containing a payload message. The currently defined messages announce that the processor is stopping for some reason (e.g. to save power). No device ever responds to this cycle; it is always terminated with a master abort after leaving the data on the bus for at least 4 cycles.
0010: I/O Read
This performs a read from I/O space. All 32 bits of the read address are provided, so that a device may (for compatibility reasons) implement less than 4 bytes worth of I/O registers. If the byte enables request data not within the address range supported by the PCI device (e.g. a 4-byte read from a device which only supports 2 bytes of I/O address space), it must be terminated with a target abort. Multiple data cycles are permitted, using linear (simple incrementing) burst ordering.
The PCI standard is discouraging the use of I/O space in new devices, preferring that as much as possible be done through main memory mapping.
0011: I/O Write
This performs a write to I/O space.
010x: Reserved
A PCI device must not respond to an address cycle with these command codes.
0110: Memory Read
This performs a read cycle from memory space. Because the smallest memory space a PCI device is permitted to implement is 16 bytes,[18][16]: §6.5.2.1  the two least significant bits of the address are not needed during the address phase; equivalent information will arrive during the data phases in the form of byte select signals. They instead specify the order in which burst data must be returned.[18][16]: §3.2.2.2  If a device does not support the requested order, it must provide the first word and then disconnect.
If a memory space is marked as "prefetchable", then the target device must ignore the byte-select signals on a memory read and always return 32 valid bits.
0111: Memory Write
This operates similarly to a memory read. The byte select signals are more important in a write, as unselected bytes must not be written to memory.
Generally, PCI writes are faster than PCI reads, because a device may buffer the incoming write data and release the bus faster. For a read, it must delay the data phase until the data has been fetched.
100x: Reserved
A PCI device must not respond to an address cycle with these command codes.
1010: Configuration Read
This is similar to an I/O read, but reads from PCI configuration space. A device must respond only if the low 11 bits of the address specify a function and register that it implements, and if the special IDSEL signal is asserted. It must ignore the high 21 bits. Burst reads (using linear incrementing) are permitted in PCI configuration space.
Unlike I/O space, standard PCI configuration registers are defined so that reads never disturb the state of the device. It is possible for a device to have configuration space registers beyond the standard 64 bytes which have read side effects, but this is rare.[31]
Configuration space accesses often have a few cycles of delay to allow the IDSEL lines to stabilize, which makes them slower than other forms of access. Also, a configuration space access requires a multi-step operation rather than a single machine instruction. Thus, it is best to avoid them during routine operation of a PCI device.
1011: Configuration Write
This operates analogously to a configuration read.
1100: Memory Read Multiple
This command is identical to a generic memory read, but includes the hint that a long read burst will continue beyond the end of the current cache line, and the target should internally prefetch a large amount of data. A target is always permitted to consider this a synonym for a generic memory read.
1101: Dual Address Cycle
When accessing a memory address that requires more than 32 bits to represent, the address phase begins with this command and the low 32 bits of the address, followed by a second cycle with the actual command and the high 32 bits of the address. PCI targets that do not support 64-bit addressing may simply treat this as another reserved command code and not respond to it. This command code may only be used with a non-zero high-order address word; it is forbidden to use this cycle if not necessary.
1110: Memory Read Line
This command is identical to a generic memory read, but includes the hint that the read will continue to the end of the cache line. A target is always permitted to consider this a synonym for a generic memory read.
1111: Memory Write and Invalidate
This command is identical to a generic memory write, but comes with the guarantee that one or more whole cache lines will be written, with all byte selects enabled. This is an optimization for write-back caches snooping the bus. Normally, a write-back cache holding dirty data must interrupt the write operation long enough to write its own dirty data first. If the write is performed using this command, the data to be written back is guaranteed to be irrelevant, and may simply be invalidated in the write-back cache.
This optimization only affects the snooping cache, and makes no difference to the target, which may treat this as a synonym for the memory write command.

PCI bus latency

[edit]

Soon after promulgation of the PCI specification, it was discovered that lengthy transactions by some devices, due to slow acknowledgments, long data bursts, or some combination, could cause buffer underrun or overrun in other devices. Recommendations on the timing of individual phases in Revision 2.0 were made mandatory in revision 2.1:[32]: 3 

  • A target must be able to complete the initial data phase (assert TRDY# and/or STOP#) within 16 cycles of the start of a transaction.
  • An initiator must complete each data phase (assert IRDY#) within 8 cycles.

Additionally, as of revision 2.1, all initiators capable of bursting more than two data phases must implement a programmable latency timer. The timer starts counting clock cycles when a transaction starts (initiator asserts FRAME#). If the timer has expired and the arbiter has removed GNT#, then the initiator must terminate the transaction at the next legal opportunity. This is usually the next data phase, but Memory Write and Invalidate transactions must continue to the end of the cache line.

Delayed transactions

[edit]

Devices unable to meet those timing restrictions must use a combination of posted writes (for memory writes) and delayed transactions (for other writes and all reads). In a delayed transaction, the target records the transaction (including the write data) internally and aborts (asserts STOP# rather than TRDY#) the first data phase. The initiator must retry exactly the same transaction later. In the interim, the target internally performs the transaction, and waits for the retried transaction. When the retried transaction is seen, the buffered result is delivered.

A device may be the target of other transactions while completing one delayed transaction; it must remember the transaction type, address, byte selects and (if a write) data value, and only complete the correct transaction.

If the target has a limit on the number of delayed transactions that it can record internally (simple targets may impose a limit of 1), it will force those transactions to retry without recording them. They will be dealt with when the current delayed transaction is completed. If two initiators attempt the same transaction, a delayed transaction begun by one may have its result delivered to the other; this is harmless.

A target abandons a delayed transaction when a retry succeeds in delivering the buffered result, the bus is reset, or when 215=32768 clock cycles (approximately 1 ms) elapse without seeing a retry. The latter should never happen in normal operation, but it prevents a deadlock of the whole bus if one initiator is reset or malfunctions.

PCI bus bridges

[edit]

The PCI standard permits multiple independent PCI buses to be connected by bus bridges that will forward operations on one bus to another when required. Although PCI tends not to use many bus bridges, PCI Express systems use many PCI-to-PCI bridge usually called PCI Express Root Port; each PCI Express slot appears to be a separate bus, connected by a bridge to the others. The PCI host bridge (usually northbridge in x86 platforms) interconnect between CPU, main memory and PCI bus.[33]

Posted writes

[edit]

Generally, when a bus bridge sees a transaction on one bus that must be forwarded to the other, the original transaction must wait until the forwarded transaction completes before a result is ready. One notable exception occurs in the case of memory writes. Here, the bridge may record the write data internally (if it has room) and signal completion of the write before the forwarded write has completed. Or, indeed, before it has begun. Such "sent but not yet arrived" writes are referred to as "posted writes", by analogy with a postal mail message. Although they offer great opportunity for performance gains, the rules governing what is permissible are somewhat intricate.[34]

Combining, merging, and collapsing

[edit]

The PCI standard permits bus bridges to convert multiple bus transactions into one larger transaction under certain situations. This can improve the efficiency of the PCI bus.

Combining

[edit]

Write transactions to consecutive addresses may be combined into a longer burst write, as long as the order of the accesses in the burst is the same as the order of the original writes. It is permissible to insert extra data phases with all byte enables turned off if the writes are almost consecutive.

Merging

[edit]

Multiple writes to disjoint portions of the same word may be merged into a single write with multiple byte enables asserted. In this case, writes that were presented to the bus bridge in a particular order are merged so they occur at the same time when forwarded.

Collapsing

[edit]

Multiple writes to the same byte or bytes may not be combined, for example, by performing only the second write and skipping the first write that was overwritten. This is because the PCI specification permits writes to have side effects.

PCI bus signals

[edit]

PCI bus transactions are controlled by five main control signals, two driven by the initiator of a transaction (FRAME# and IRDY#), and three driven by the target (DEVSEL#, TRDY#, and STOP#). There are two additional arbitration signals (REQ# and GNT#) that are used to obtain permission to initiate a transaction.[6] All are active-low, meaning that the active or asserted state is a low voltage. Pull-up resistors on the motherboard ensure they will remain high (inactive or deasserted) if not driven by any device, but the PCI bus does not depend on the resistors to change the signal level; all devices drive the signals high for one cycle before ceasing to drive the signals.

Signal timing

[edit]

All PCI bus signals are sampled on the rising edge of the clock. Signals nominally change on the falling edge of the clock, giving each PCI device approximately one half a clock cycle to decide how to respond to the signals it observed on the rising edge, and one half a clock cycle to transmit its response to the other device.

The PCI bus requires that every time the device driving a PCI bus signal changes, one turnaround cycle must elapse between the time the one device stops driving the signal and the other device starts. Without this, there might be a period when both devices were driving the signal, which would interfere with bus operation.

The combination of this turnaround cycle and the requirement to drive a control line high for one cycle before ceasing to drive it means that each of the main control lines must be high for a minimum of two cycles when changing owners. The PCI bus protocol is designed so this is rarely a limitation; only in a few special cases (notably fast back-to-back transactions) is it necessary to insert additional delay to meet this requirement.

Arbitration

[edit]

Any device on a PCI bus that is capable of acting as a bus master may initiate a transaction with any other device. To ensure that only one transaction is initiated at a time, each master must first wait for a bus grant signal, GNT#, from an arbiter located on the motherboard. Each device has a separate request line REQ# that requests the bus, but the arbiter may "park" the bus grant signal at any device if there are no current requests.

The arbiter may remove GNT# at any time. A device that loses GNT# may complete its current transaction, but may not start one (by asserting FRAME#) unless it observes GNT# asserted the cycle before it begins.

The arbiter may also provide GNT# at any time, including during another master's transaction. During a transaction, either FRAME# or IRDY# or both are asserted; when both are deasserted, the bus is idle. A device may initiate a transaction at any time that GNT# is asserted and the bus is idle.

Address phase

[edit]

A PCI bus transaction begins with an address phase. The initiator (usually a chipset), seeing that it has GNT# and the bus is idle, drives the target address onto the AD[31:0] lines, the associated command (e.g. memory read, or I/O write) on the C/BE[3:0]# lines, and pulls FRAME# low.

Each other device examines the address and command and decides whether to respond as the target by asserting DEVSEL#. A device must respond by asserting DEVSEL# within 3 cycles. Devices that promise to respond within 1 or 2 cycles are said to have "fast DEVSEL" or "medium DEVSEL", respectively. (Actually, the time to respond is 2.5 cycles, since PCI devices must transmit all signals half a cycle early so that they can be received three cycles later.)

A device must latch the address on the first cycle; the initiator is required to remove the address and command from the bus on the following cycle, even before receiving a DEVSEL# response. The additional time is available only for interpreting the address and command after it is captured.

On the fifth cycle of the address phase (or earlier if all other devices have medium DEVSEL or faster), a catch-all "subtractive decoding" is allowed for some address ranges. This is commonly used by an ISA bus bridge for addresses within its range (24 bits for memory and 16 bits for I/O).

On the sixth cycle, if there has been no response, the initiator may abort the transaction by deasserting FRAME#. This is known as master abort termination and it is customary for PCI bus bridges to return all-ones data (0xFFFFFFFF) in this case. PCI devices, therefore, are generally designed to avoid using the all-ones value in important status registers, so that such an error can be easily detected by software.

Address phase timing

[edit]

Notes:

  • GNT# Irrelevant after cycle has started
  • Address is only valid for one cycle.
  • C/BE will provide the command following by first data phase byte enables

On the rising edge of clock 0, the initiator observes FRAME# and IRDY# both high, and GNT# low, so it drives the address, command, and asserts FRAME# in time for the rising edge of clock 1. Targets latch the address and begin decoding it. They may respond with DEVSEL# in time for clock 2 (fast DEVSEL), 3 (medium) or 4 (slow). Subtractive decode devices, seeing no other response by clock 4, may respond on clock 5. If the master does not see a response by clock 5, it will terminate the transaction and remove FRAME# on clock 6.

TRDY# and STOP# are deasserted (high) during the address phase. The initiator may assert IRDY# as soon as it is ready to transfer data, which could theoretically be as soon as clock 2.

Dual-cycle address

[edit]

To allow 64-bit addressing, a master will present the address over two consecutive cycles. First, it sends the low-order address bits with a special "dual-cycle address" command on the C/BE[3:0]#. On the following cycle, it sends the high-order address bits and the actual command. Dual-address cycles are forbidden if the high-order address bits are zero, so devices that do not support 64-bit addressing can simply not respond to dual-cycle commands.

              _  0_  1_  2_  3_  4_  5_  6_
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/
            ___
       GNT#    \___/XXXXXXXXXXXXXXXXXXXXXXX
            _______
     FRAME#        \_______________________
                    ___ ___
   AD[31:0] -------<___X___>--------------- (Low, then high bits)
                    ___ ___ _______________
 C/BE[3:0]# -------<___X___X_______________ (DAC, then actual command)
            ___________________________
    DEVSEL#                \___\___\___\___
                         Fast Med Slow
              _   _   _   _   _   _   _   _
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/
                 0   1   2   3   4   5   6

Configuration access

[edit]

Addresses for PCI configuration space access use special decoding. For these, the low-order address lines specify the offset of the desired PCI configuration register, and the high-order address lines are ignored. Instead, an additional address signal, the IDSEL input, must be high before a device may assert DEVSEL#. Each slot connects a different high-order address line to the IDSEL pin and is selected using one-hot encoding on the upper address lines.

Data phases

[edit]

After the address phase (specifically, beginning with the cycle that DEVSEL# goes low) comes a burst of one or more data phases. In all cases, the initiator drives active-low byte select signals on the C/BE[3:0]# lines, but the data on the AD[31:0] may be driven by the initiator (in case of writes) or target (in case of reads).

During data phases, the C/BE[3:0]# lines are interpreted as active-low byte enables. In case of a write, the asserted signals indicate which of the four bytes on the AD bus are to be written to the addressed location. In the case of a read, they indicate which bytes the initiator is interested in. For reads, it is always legal to ignore the byte-enable signals and simply return all 32 bits; cacheable memory resources are required to always return 32 valid bits. The byte enables are mainly useful for I/O space accesses where reads have side effects.

A data phase with all four C/BE# lines deasserted is explicitly permitted by the PCI standard, and must have no effect on the target other than to advance the address in the burst access in progress.

The data phase continues until both parties are ready to complete the transfer and continue to the next data phase. The initiator asserts IRDY# (initiator ready) when it no longer needs to wait, while the target asserts TRDY# (target ready). Whichever side is providing the data must drive it on the AD bus before asserting its ready signal.

Once one of the participants asserts its ready signal, it may not become un-ready or otherwise alter its control signals until the end of the data phase. The data recipient must latch the AD bus each cycle until it sees both IRDY# and TRDY# asserted, which marks the end of the current data phase and indicates that the just-latched data is the word to be transferred.

To maintain full burst speed, the data sender then has half a clock cycle after seeing both IRDY# and TRDY# asserted to drive the next word onto the AD bus.

             0_  1_  2_  3_  4_  5_  6_  7_  8_  9_
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/
                ___         _______     ___ ___ ___
   AD[31:0] ---<___XXXXXXXXX_______XXXXX___X___X___ (If a write)
                ___             ___ _______ ___ ___
   AD[31:0] ---<___>~~~<XXXXXXXX___X_______X___X___ (If a read)
                ___ _______________ _______ ___ ___
 C/BE[3:0]# ---<___X_______________X_______X___X___ (Must always be valid)
            _______________      |  ___  |   |   |
      IRDY#              x \_______/ x \___________
            ___________________  |       |   |   |
      TRDY#              x   x \___________________
            ___________          |       |   |   |
    DEVSEL#            \___________________________
            ___                  |       |   |   |
     FRAME#    \___________________________________
              _   _   _   _   _  |_   _  |_  |_  |_
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/
             0   1   2   3   4   5   6   7   8   9

This continues the address cycle illustrated above, assuming a single address cycle with medium DEVSEL, so the target responds in time for clock 3. However, at that time, neither side is ready to transfer data. For clock 4, the initiator is ready, but the target is not. On clock 5, both are ready, and a data transfer takes place (as indicated by the vertical lines). For clock 6, the target is ready to transfer, but the initiator is not. On clock 7, the initiator becomes ready, and data is transferred. For clocks 8 and 9, both sides remain ready to transfer data, and data is transferred at the maximum possible rate (32 bits per clock cycle).

In case of a read, clock 2 is reserved for turning around the AD bus, so the target is not permitted to drive data on the bus even if it is capable of fast DEVSEL.

Fast DEVSEL# on reads

[edit]

A target that supports fast DEVSEL could in theory begin responding to a read on the cycle after the address is presented. This cycle is, however, reserved for AD bus turnaround. Thus, a target may not drive the AD bus (and thus may not assert TRDY#) on the second cycle of a transaction. Most targets will not be this fast and will not need any special logic to enforce this condition.

Ending transactions

[edit]

Either side may request that a burst end after the current data phase. Simple PCI devices that do not support multi-word bursts will always request this immediately. Even devices that do support bursts will have some limit on the maximum length they can support, such as the end of their addressable memory.

Initiator burst termination

[edit]

The initiator can mark any data phase as the final one in a transaction by deasserting FRAME# at the same time as it asserts IRDY#. The cycle after the target asserts TRDY#, the final data transfer is complete, both sides deassert their respective RDY# signals, and the bus is idle again. The master may not deassert FRAME# before asserting IRDY#, nor may it deassert FRAME# while waiting, with IRDY# asserted, for the target to assert TRDY#.

The only minor exception is a master abort termination, when no target responds with DEVSEL#. Obviously, it is pointless to wait for TRDY# in such a case. However, even in this case, the master must assert IRDY# for at least one cycle after deasserting FRAME#. (Commonly, a master will assert IRDY# before receiving DEVSEL#, so it must simply hold IRDY# asserted for one cycle longer.) This is to ensure that bus turnaround timing rules are obeyed on the FRAME# line.

Target burst termination

[edit]

The target requests the initiator end a burst by asserting STOP#. The initiator will then end the transaction by deasserting FRAME# at the next legal opportunity; if it wishes to transfer more data, it will continue in a separate transaction. There are several ways for the target to do this:

Disconnect with data
If the target asserts STOP# and TRDY# at the same time, this indicates that the target wishes this to be the last data phase. For example, a target that does not support burst transfers will always do this to force single-word PCI transactions. This is the most efficient way for a target to end a burst.
Disconnect without data
If the target asserts STOP# without asserting TRDY#, this indicates that the target wishes to stop without transferring data. STOP# is considered equivalent to TRDY# for the purpose of ending a data phase, but no data is transferred.
Retry
A Disconnect without data before transferring any data is a retry, and unlike other PCI transactions, PCI initiators are required to pause slightly before continuing the operation. See the PCI specification for details.
Target abort
Normally, a target holds DEVSEL# asserted through the last data phase. However, if a target deasserts DEVSEL# before disconnecting without data (asserting STOP#), this indicates a target abort, which is a fatal error condition. The initiator may not retry, and typically treats it as a bus error. A target may not deassert DEVSEL# while waiting with TRDY# or STOP# low; it must do this at the beginning of a data phase.

It will always take at least one cycle for the initiator to notice a target-initiated disconnection request and respond by deasserting FRAME#. There are two sub-cases, which take the same amount of time, but one requires an additional data phase:

Disconnect-A
If the initiator observes STOP# before asserting its own IRDY#, then it can end the burst by deasserting FRAME# at the same time as it asserts IRDY#, ending the burst after the current data phase.
Disconnect-B
If the initiator has already asserted IRDY# (without deasserting FRAME#) by the time it observes the target's STOP#, it is committed to an additional data phase. The target must wait through an additional data phase without data, holding STOP# asserted without TRDY#, before the transaction can end.

If the initiator ends the burst at the same time as the target requests disconnection, there is no additional bus cycle.

Burst addressing

[edit]

For memory space accesses, the words in a burst may be accessed in several orders. The unnecessary low-order address bits AD[1:0] are used to convey the initiator's requested order. A target which does not support a particular order must terminate the burst after the first word. Some of these orders depend on the cache line size, which is configurable on all PCI devices.

PCI burst ordering
A[1] A[0] Burst order (with 16-byte cache line)
0 0 Linear incrementing (0x0C, 0x10, 0x14, 0x18, 0x1C, ...)
0 1 Cacheline toggle (0x0C, 0x08, 0x04, 0x00, 0x1C, 0x18, ...)
1 0 Cacheline wrap (0x0C, 0x00, 0x04, 0x08, 0x1C, 0x10, ...)
1 1 Reserved (disconnect after first transfer)

If the starting offset within the cache line is zero, all of these modes reduce to the same order.

Cache line toggle and cache line wrap modes are two forms of critical-word-first cache line fetching. Toggle mode XORs the supplied address with an incrementing counter. This is the native order for Intel 486 and Pentium processors. It has the advantage that it is not necessary to know the cache line size to implement it.

PCI version 2.1 obsoleted toggle mode and added the cache line wrap mode,[32]: 2  where fetching proceeds linearly, wrapping around at the end of each cache line. When one cache line is completely fetched, fetching jumps to the starting offset in the next cache line.

Most PCI devices only support a limited range of typical cache line sizes; if the cache line size is programmed to an unexpected value, they force single-word access.

PCI also supports burst access to I/O and configuration space, but only linear mode is supported. (This is rarely used, and may be buggy in some devices; they may not support it, but not properly force single-word access either.)

Transaction examples

[edit]

This is the highest-possible speed four-word write burst, terminated by the master:

             0_  1_  2_  3_  4_  5_  6_  7_
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \
                ___ ___ ___ ___ ___
   AD[31:0] ---<___X___X___X___X___>---<___>
                ___ ___ ___ ___ ___
 C/BE[3:0]# ---<___X___X___X___X___>---<___>
                     |   |   |   |  ___
      IRDY# ^^^^^^^^\______________/   ^^^^^
                     |   |   |   |  ___
      TRDY# ^^^^^^^^\______________/   ^^^^^
                     |   |   |   |  ___
    DEVSEL# ^^^^^^^^\______________/   ^^^^^
            ___      |   |   |  ___
     FRAME#    \_______________/ | ^^^^\____
              _   _  |_  |_  |_  |_   _   _
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \
             0   1   2   3   4   5   6   7

On clock edge 1, the initiator starts a transaction by driving an address, command, and asserting FRAME# The other signals are idle (indicated by ^^^), pulled high by the motherboard's pull-up resistors. That might be their turnaround cycle. On cycle 2, the target asserts both DEVSEL# and TRDY#. As the initiator is also ready, a data transfer occurs. This repeats for three more cycles, but before the last one (clock edge 5), the master deasserts FRAME#, indicating that this is the end. On clock edge 6, the AD bus and FRAME# are undriven (turnaround cycle) and the other control lines are driven high for 1 cycle. On clock edge 7, another initiator can start a different transaction. This is also the turnaround cycle for the other control lines.

The equivalent read burst takes one more cycle, because the target must wait 1 cycle for the AD bus to turn around before it may assert TRDY#:

             0_  1_  2_  3_  4_  5_  6_  7_  8_
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \
                ___     ___ ___ ___ ___
   AD[31:0] ---<___>---<___X___X___X___>---<___>
                ___ _______ ___ ___ ___
 C/BE[3:0]# ---<___X_______X___X___X___>---<___>
            ___          |   |   |   |  ___
      IRDY#    ^^^^\___________________/   ^^^^^
            ___    _____ |   |   |   |  ___
      TRDY#    ^^^^     \______________/   ^^^^^
            ___          |   |   |   |  ___
    DEVSEL#    ^^^^\___________________/   ^^^^^
            ___          |   |   |  ___
     FRAME#    \___________________/ | ^^^^\____
              _   _   _  |_  |_  |_  |_   _   _
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \
             0   1   2   3   4   5   6   7   8

A high-speed burst terminated by the target will have an extra cycle at the end:

             0_  1_  2_  3_  4_  5_  6_  7_  8_
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \
                ___     ___ ___ ___ ___
   AD[31:0] ---<___>---<___X___X___X___XXXX>----
                ___ _______ ___ ___ ___ ___
 C/BE[3:0]# ---<___X_______X___X___X___X___>----
                         |   |   |   |      ___
      IRDY# ^^^^^^^\_______________________/
                   _____ |   |   |   |  _______
      TRDY# ^^^^^^^     \______________/
                   ________________  |      ___
      STOP# ^^^^^^^      |   |   | \_______/
                         |   |   |   |      ___
    DEVSEL# ^^^^^^^\_______________________/
            ___          |   |   |   |  ___
     FRAME#    \_______________________/   ^^^^
              _   _   _  |_  |_  |_  |_   _   _
        CLK _/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \
             0   1   2   3   4   5   6   7   8

On clock edge 6, the target indicates that it wants to stop (with data), but the initiator is already holding IRDY# low, so there is a fifth data phase (clock edge 7), during which no data is transferred.

Parity

[edit]

The PCI bus detects parity errors, but does not attempt to correct them by retrying operations; it is purely a failure indication. Due to this, there is no need to detect the parity error before it has happened, and the PCI bus actually detects it a few cycles later. During a data phase, whichever device is driving the AD[31:0] lines computes even parity over them and the C/BE[3:0]# lines, and sends that out the PAR line one cycle later. All access rules and turnaround cycles for the AD bus apply to the PAR line, just one cycle later. The device listening on the AD bus checks the received parity and asserts the PERR# (parity error) line one cycle after that. This generally generates a processor interrupt, and the processor can search the PCI bus for the device which detected the error.

The PERR# line is only used during data phases, once a target has been selected. If a parity error is detected during an address phase (or the data phase of a Special Cycle), the devices which observe it assert the SERR# (System error) line.

Even when some bytes are masked by the C/BE# lines and not in use, they must still have some defined value, and this value must be used to compute the parity.

Fast back-to-back transactions

[edit]

Due to the need for a turnaround cycle between different devices driving PCI bus signals, in general it is necessary to have an idle cycle between PCI bus transactions. However, in some circumstances it is permitted to skip this idle cycle, going directly from the final cycle of one transfer (IRDY# asserted, FRAME# deasserted) to the first cycle of the next (FRAME# asserted, IRDY# deasserted).

An initiator may only perform back-to-back transactions when:

  • they are by the same initiator (or there would be no time to turn around the C/BE# and FRAME# lines),
  • the first transaction was a write (so there is no need to turn around the AD bus), and
  • the initiator still has permission (from its GNT# input) to use the PCI bus.

Additional timing constraints may come from the need to turn around are the target control lines, particularly DEVSEL#. The target deasserts DEVSEL#, driving it high, in the cycle following the final data phase, which in the case of back-to-back transactions is the first cycle of the address phase. The second cycle of the address phase is then reserved for DEVSEL# turnaround, so if the target is different from the prior one, it must not assert DEVSEL# until the third cycle (medium DEVSEL speed).

One case where this problem cannot arise is if the initiator knows somehow (presumably because the addresses share sufficient high-order bits) that the second transfer is addressed to the same target as the prior one. In that case, it may perform back-to-back transactions. All PCI targets must support this.

It is also possible for the target to keep track of the requirements. If it never does fast DEVSEL, they are met trivially. If it does, it must wait until medium DEVSEL time unless:

  • the current transaction was preceded by an idle cycle (is not back-to-back), or
  • the prior transaction was to the same target, or
  • the current transaction began with a double address cycle.

Targets that have this ability indicate it by a special bit in a PCI configuration register, and if all targets on a bus have it, all initiators may use back-to-back transfers freely.

A subtractive decoding bus bridge must know to expect this extra delay in the event of back-to-back cycles, to advertise back-to-back support.

64-bit PCI

[edit]

Starting from revision 2.1,[clarification needed] the PCI specification includes optional 64-bit support. This is provided via an extended connector which provides the 64-bit bus extensions AD[63:32], C/BE[7:4]#, and PAR64, and a number of additional power and ground pins. The 64-bit PCI connector can be distinguished from a 32-bit connector by the additional 64-bit segment.

Memory transactions between 64-bit devices may use all 64 bits to double the data transfer rate. Non-memory transactions (including configuration and I/O space accesses) may not use the 64-bit extension. During a 64-bit burst, burst addressing works just as in a 32-bit transfer, but the address is incremented twice per data phase. The starting address must be 64-bit aligned; i.e. AD2 must be 0. The data corresponding to the intervening addresses (with AD2 = 1) is carried on the upper half of the AD bus.

To initiate a 64-bit transaction, the initiator drives the starting address on the AD bus and asserts REQ64# at the same time as FRAME#. If the selected target can support a 64-bit transfer for this transaction, it replies by asserting ACK64# at the same time as DEVSEL#. A target may decide on a per-transaction basis whether to allow a 64-bit transfer.

If REQ64# is asserted during the address phase, the initiator also drives the high 32 bits of the address and a copy of the bus command on the high half of the bus. If the address requires 64 bits, a dual address cycle is still required, but the high half of the bus carries the upper half of the address and the final command code during both address phase cycles; this allows a 64-bit target to see the entire address and begin responding earlier.

If the initiator sees DEVSEL# asserted without ACK64#, it performs 32-bit data phases. The data which would have been transferred on the upper half of the bus during the first data phase is instead transferred during the second data phase. Typically, the initiator drives all 64 bits of data before seeing DEVSEL#. If ACK64# is missing, it may cease driving the upper half of the data bus.

The REQ64# and ACK64# lines are held asserted for the entire transaction save the last data phase and deasserted at the same time as FRAME# and DEVSEL#, respectively.

The PAR64 line operates just like the PAR line, but provides even parity over AD[63:32] and C/BE[7:4]#. It is only valid for address phases if REQ64# is asserted. PAR64 is only valid for data phases if both REQ64# and ACK64# are asserted.

Cache snooping (obsolete)

[edit]

PCI originally included optional support for write-back cache coherence. This required support by cacheable memory targets, which would listen to two pins from the cache on the bus, SDONE (snoop done) and SBO# (snoop backoff).[35]

Because this was rarely implemented in practice, it was deleted from revision 2.2 of the PCI specification,[16][36] and the pins re-used for SMBus access in revision 2.3.[18]

The cache would watch all memory accesses, without asserting DEVSEL#. If it noticed an access that might be cached, it would drive SDONE low (snoop not done). A coherence-supporting target would avoid completing a data phase (asserting TRDY#) until it observed SDONE high.

In the case of a write to data that was clean in the cache, the cache would only have to invalidate its copy and would assert SDONE as soon as this was established. However, if the cache contained dirty data, the cache would have to write it back before the access could proceed. so it would assert SBO# when raising SDONE. This would signal the active target to assert STOP# rather than TRDY#, causing the initiator to disconnect and retry the operation later. In the meantime, the cache would arbitrate for the bus and write its data back to memory.

Targets supporting cache coherency are also required to terminate bursts before they cross cache lines.

Development tools

[edit]
A PCI POST card that displays power-on self-test (POST) numbers during BIOS startup

When developing and/or troubleshooting the PCI bus, examination of hardware signals can be very important. Logic analyzers and bus analyzers are tools that collect, analyze, and decode signals for users to view in useful ways.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Peripheral Component Interconnect (PCI) is an industry-standard local bus architecture designed for connecting hardware components, such as add-in cards and peripherals, to a computer's motherboard. Developed by Intel as a response to fragmented bus standards like ISA and VESA's VL-Bus, the original PCI specification was released in 1992 and first implemented in 1993 alongside the Pentium processor. The PCI Special Interest Group (), an open established in with over 1,000 members, was formed to maintain and evolve the standard, ensuring broad industry compatibility through royalty-free licensing. As a parallel bus operating at 33 MHz with a 32-bit data width (expandable to 64 bits), conventional PCI supported maximum theoretical throughput of 133 MB/s, featuring plug-and-play auto-configuration for resources like interrupts and addressing. It quickly became ubiquitous in PCs, enabling faster data transfer for devices like graphics cards and network adapters, and was named PC Magazine's Product of the Year in for its role in standardizing hardware integration. Later revisions introduced 66 MHz speeds and 3.3V signaling for improved efficiency, while extended it for servers with higher bandwidth up to 1 GB/s. By the early 2000s, limitations of the parallel design prompted the transition to PCI Express (PCIe), a serial point-to-point interface launched in , which offers scalable lanes and dramatically higher speeds while maintaining with PCI software. Today, while legacy PCI slots are rare in consumer hardware, its foundational principles underpin modern expansions like PCIe 7.0, supporting data rates up to 128 GT/s for applications in AI, storage, and networking.

Overview

Definition and Purpose

The Peripheral Component Interconnect (PCI) is a high-speed parallel computer expansion bus standard developed by and introduced in as a local bus system for connecting peripheral devices to a computer's . Designed to enable modular hardware expansion, PCI provides a standardized interface for add-in cards—such as accelerators, cards, and network interfaces—to interface directly with the (CPU) and system memory. The primary purpose of PCI is to facilitate efficient, high-bandwidth communication between the host processor and peripheral devices, supporting burst-mode data transfers at speeds up to 133 MB/s in its original 32-bit configuration operating at 33 MHz. This capability addressed the limitations of earlier expansion buses like the Industry Standard Architecture (ISA), which was constrained to 8.33 MB/s, and the VESA Local Bus (VLB), a short-lived interim solution offering theoretical bandwidth up to 133 MB/s at 33 MHz but lacking robust standardization, electrical stability for multiple devices, and plug-and-play support. By incorporating auto-configuration mechanisms, PCI simplified device installation and resource allocation, promoting broader adoption in personal computers during the mid-1990s. Fundamentally, PCI employs a shared parallel bus with multiple expansion slots connected via a common set of address, data, and control lines, allowing up to five devices per bus segment. Transactions occur in a master-slave model, where a bus master (such as the CPU or a peripheral card) initiates read or write operations to a target slave device, enabling and synchronized data exchange across the system. Later revisions expanded these foundations to include 66 MHz clock rates and 64-bit data widths for enhanced .

Key Features and Advantages

The (PCI) bus operates synchronously, utilizing a shared to coordinate all transactions among connected devices, which ensures predictable timing and simplifies protocol implementation compared to asynchronous buses. The base specification defines a 33 MHz , delivering theoretical peak bandwidth of 133 MB/s for 32-bit transfers, with later revisions supporting 66 MHz for doubled performance. It employs a multiplexed 32-bit and bus, which can be extended to 64 bits via optional signaling for enhanced capacity in high-bandwidth applications. Architecturally, PCI supports up to 32 devices per bus through unique device numbering in its configuration mechanism, though electrical loading constraints typically limit unbuffered implementations to around 10 loads, including the host bridge and slots. A primary advantage of PCI is its burst transfer mode, which enables multiple consecutive data phases following a single address phase, allowing efficient to or I/O without repeated addressing overhead. This contrasts sharply with the ISA bus, where each data transfer requires a dedicated address cycle, capping ISA throughput at approximately 8 MB/s even at its 8 MHz clock, while PCI achieves significantly higher effective rates for burst-oriented operations like or disk I/O. capabilities further reduce CPU involvement by permitting peripheral devices to initiate (DMA) transactions, offloading data movement and minimizing processor interrupts for sustained transfers. PCI's plug-and-play auto-configuration, facilitated by a 256-byte configuration space per device accessible via standardized reads and writes during system initialization, enables dynamic through or operating system , obviating manual jumper or switch settings common in ISA systems. This promotes ease of use and scalability across diverse hardware. The bus also ensures with slower devices, as all components adhere to the same protocol but can signal readiness at reduced speeds without disrupting higher-speed peers. In specialized implementations, the PCI Hot-Plug specification allows runtime insertion or removal of cards with and surprise removal detection, enhancing reliability in server or industrial environments.

History and Development

Origins and Initial Design

The Peripheral Component Interconnect (PCI) standard originated in the early 1990s as a response to the growing performance demands of personal computers, particularly with the impending release of Intel's processor. Intel's Architecture Labs began developing the PCI local bus around 1990 to create a high-performance, processor-independent interface for connecting peripherals directly to the CPU, bypassing the limitations of existing expansion buses. The primary motivations were the shortcomings of the (ISA) bus, which operated at only 8.33 MHz with a 16-bit data width, resulting in a maximum throughput of about 8 MB/s and lacking support for efficient or plug-and-play configuration, and the (EISA) bus, which, while offering 32-bit addressing and at up to 8.33 MHz (around 33 MB/s), was overly complex, expensive to implement, and primarily suited for servers rather than desktops. In late 1991, Intel collaborated with key industry partners—including IBM, Compaq, and Digital Equipment Corporation (DEC)—to refine the design and promote it as an open standard, culminating in the formation of the PCI Special Interest Group (PCI-SIG) in June 1992. The PCI-SIG, with these founding members at its core, aimed to ensure broad adoption by managing compliance and evolution of the specification. The initial PCI Local Bus Specification, version 1.0, was released by Intel in June 1992, defining a 32-bit bus operating at 33 MHz for a theoretical maximum bandwidth of 133 MB/s, supporting both burst transfers and plug-and-play resource allocation to simplify system integration. This design targeted desktop and server systems, emphasizing simplicity, low cost, and scalability over the proprietary or fragmented alternatives like VESA Local Bus. Early adoption accelerated in 1993 following the launch of 's processor in March, with the company's 430LX chipset (codenamed Mercury) integrating PCI support as the first such implementation for -based systems. Unveiled publicly at the trade show in November 1993, PCI quickly gained traction in PC manufacturing, enabling faster I/O for , networking, and storage peripherals in an era of rapidly advancing CPU speeds. By integrating PCI into mainstream chipsets, and its partners marked the transition to a unified, high-speed expansion standard that dominated PC architectures for the next decade.

Standardization and Revisions

The (PCI-SIG) was established in 1992 by , , , DEC, and other prominent industry players to govern the PCI specification, ensuring its evolution through collaborative development and compliance testing. This quickly grew to include hundreds of members, fostering widespread by standardizing the interface for peripheral connectivity across diverse hardware ecosystems. Subsequent revisions to the PCI Local Bus Specification refined its capabilities to meet emerging computational demands. , released on April 30, 1993, formalized the core connector design, pinout, and electrical signaling, providing a stable foundation for implementation. Version 2.1, issued June 1, 1995, introduced support for 66 MHz operation to double potential bandwidth over the original 33 MHz clock and added optional 64-bit address and data extensions for enhanced performance in high-end systems. These updates enabled broader compatibility with faster processors while maintaining with earlier designs. Further enhancements came in Version 2.2, published December 18, 1998, which incorporated refinements to protocols, including better support for low-power states and hot-plug capabilities through companion specifications. Version 2.3, effective March 29, 2002, addressed limitations in 64-bit addressing for systems exceeding 4 GB of RAM by modifying the configuration space to handle mappings, while deprecating 5 V signaling in favor of 3.3 V for improved efficiency and safety. These revisions solidified PCI as a industry standard, with implementations in chipsets from vendors like , , and , enabling seamless integration in billions of personal computers and servers. By 2003, the shifted primary development efforts toward , recognizing the need for serial interconnects to support escalating bandwidth requirements, though conventional PCI continued to receive errata updates and legacy support thereafter. This transition marked the maturation of PCI as a foundational , with its specifications remaining influential in embedded and industrial applications.

Physical and Electrical Specifications

Connector Design and Pinout

The PCI connector utilizes an edge-card design with gold-plated contacts, known as "gold fingers," on the add-in card that insert into a slot on the or . The standard 32-bit PCI connector consists of 62 pins per side (124 total contacts), with 120 dedicated to signals and 4 serving as keying positions to prevent incompatible insertions. For 64-bit PCI support, an extension adds 30 pins per side (60 total additional contacts), enabling wider data paths while maintaining with 32-bit cards, for a total of 92 pins per side (184 contacts). Key signal pins are assigned as follows: the multiplexed address and data lines AD[31:0] occupy designated positions across both sides (e.g., A20 for AD31, A31/B31 for AD0/AD1), allowing bidirectional transfer of 32-bit addresses and . Bus command signals C/BE[3:0]# (e.g., A32 for C/BE0#, B32 for C/BE1#, A28 for C/BE2#, B28 for C/BE3#) indicate the type of transaction, such as memory read or I/O write. Control signals include FRAME# (A34) to delineate the start and duration of a bus transaction, IRDY# (A35) and TRDY# (B35) for initiator and target ready states, DEVSEL# (B36) for device select assertion, and STOP# (A36) to request transaction termination. Power and ground pins are distributed throughout, with +5V (e.g., A23, B23), +3.3V (e.g., A42, B42 in 3.3V keyed slots), and multiple GND connections (e.g., A3, B3) for stable operation. Signals are grouped logically for efficient routing and : address/data and parity pins form the core multiplexed bus in the middle of the connector, while frame and control signals cluster on the edges near the card's leading and trailing ends. Key slots at specific positions (pins A12/A13 and B12/B13) differentiate 5V-only, 3.3V-only, and universal voltage environments, ensuring electrical compatibility. A 32-bit PCI card, using only the first 62 pins, can insert into a 64-bit slot if the slot features universal keying, though the extension remains unused; conversely, 64-bit cards require a full 64-bit slot to access the additional AD[63:32] and C/BE[7:4]# pins.
Signal GroupExample Pins (Side A/B)Description
Address/Data (AD)A20 (AD31), A31 (AD0) / B31 (AD1), B20 (AD30)Multiplexed 32-bit lines for addresses and data
Bus Commands (C/BE#)A32 (0#), A28 (2#) / B32 (1#), B28 (3#)Command/byte enable signals (4 bits for 32-bit)
Transaction ControlA34 (FRAME#), A35 (IRDY#), A36 (STOP#) / B35 (TRDY#), B36 (DEVSEL#)Bus phase and handshake signals
Power/GroundA23 (+5V), A42 (+3.3V), A3 (GND) / B23 (+5V), B42 (+3.3V), B3 (GND)Supply and reference voltages
64-bit ExtensionA64-A93, B64-B93 (approx.)Additional AD[63:32], C/BE[7:4]#, parity, and REQ64#/ACK64#
This table illustrates representative pin assignments from the 32-bit base; full details span all 124 positions in the specification.

Voltage Levels and Keying

The original PCI Local Bus Specification, released in 1992, supported only 5V signaling and power supply for add-in cards and slots. To address increasing power demands and enable lower consumption in denser systems, 3.3V signaling was introduced in Revision 2.0 of the specification in 1993, with further refinements for universal compatibility in Revision 2.1 in 1995. Universal slots accommodate both voltage levels by providing separate power pins—VCC for 5V and VCC3.3 for 3.3V—allowing cards to detect the available voltage through the VI/O pin and configure their I/O buffers accordingly. Mechanical keying prevents the insertion of incompatible cards into slots by using notches on the card's edge connector that align with raised tabs in the slot. 3.3V-only cards feature a notch between pins 12 and 13 (approximately 56 mm from the card's backplate), while 5V-only cards have a notch between pins 32 and 33 (approximately 104 mm from the backplate); universal cards include both notches to fit either slot type. These keying positions ensure that a 3.3V card cannot be inserted into a 5V-only slot (and vice versa), avoiding potential electrical mismatches. Pin assignments for the power rails are detailed in the connector design specifications. Power delivery to PCI slots occurs primarily through the +5V and +3.3V rails, with add-in cards limited to a maximum of 25 W combined from these rails, as encoded by the card's presence detect pins (PRSNT1# and PRSNT2#) in increments of 7.5 W up to that limit. Auxiliary +12 V and -12 V rails are available for specialized needs, such as analog components or EEPROM programming, typically supporting up to 1 A on +12 V and 0.5 A on -12 V, though these are optional and depend on system implementation. Inserting a 5V-only card into a 3.3V-only slot can lead to compatibility issues, including improper signaling levels that may cause unreliable operation or component damage due to voltage mismatches. Conversely, the greater risk arises from inserting a 3.3V card into a 5V slot, where the higher signaling voltage can exceed the card's tolerances and cause immediate failure, particularly in hot-plug scenarios without proper sequencing. These mechanisms collectively ensure safe and reliable voltage handling in PCI systems.

Form Factors and Compatibility

PCI add-in cards adhere to defined form factors to ensure compatibility with various sizes while maintaining a standardized for insertion into slots. The full-length form factor measures 312 mm (12.28 inches) in length, providing ample space for components requiring extensive board area. Half-length cards are limited to 175 mm (6.9 inches), suitable for systems with restricted internal dimensions. Low-profile variants, intended for slimline cases, utilize shorter lengths—MD1 at 119.91 mm (4.72 inches) for basic 32-bit cards and MD2 up to 167.64 mm (6.6 inches) for more complex designs—with a maximum height of 64.41 mm (2.54 inches) including the connector, yet all employ the identical 32-bit or 64-bit as full-size cards. Compatibility across form factors emphasizes backward and forward integration. A 32-bit PCI card fits securely into a 64-bit slot, occupying the initial 32-bit portion of the longer connector without requiring an adapter, though performance remains limited to 32-bit capabilities. Universal slots and cards facilitate voltage compatibility by supporting both 3.3 V and 5 V signaling through dual-keying mechanisms that prevent incorrect insertions. Mini PCI, a compact variant introduced by PCI-SIG in late 1999, addresses space constraints in portable devices like laptops with a reduced board size of approximately 59.6 mm × 50.95 mm. It supports 32-bit operations at 33 MHz and integrates directly into motherboards via an edge connector. The specification defines three types for varying stacking needs: Type I for single-height cards, Type II for dual-height configurations allowing stacked components such as modems, and Type III for even taller stacking in thicker assemblies. Type I and II use a 100-pin connector, while Type III employs a 124-pin interface to accommodate additional pins for power and signals. Voltage keying in Mini PCI mirrors standard PCI practices to avoid electrical mismatches. Furthermore, Mini PCI cards can interface with CardBus bridges to enable hot-plug capabilities in supported systems.

Configuration Mechanisms

Auto-Configuration Process

The auto-configuration process in PCI allows the system to dynamically discover, identify, and initialize connected devices during without requiring manual jumper settings or switches. This software-driven mechanism is initiated by the host bridge under or operating system control, which systematically scans the PCI bus hierarchy starting from bus 0. The scan probes each possible bus (0-255), device (0-31), and function (0-7 for multifunction devices) by issuing configuration read transactions to the 256-byte configuration space allocated per device/function. These transactions use Type 00h cycles for devices on the local bus and Type 01h cycles for propagating to downstream buses via bridges, enabling of the entire . PCI defines two configuration access mechanisms to facilitate this probing, with Mechanism #1 serving as the primary method in version 1.0 and later. Mechanism #1 employs I/O-mapped ports—0x0CF8 for setting a 32-bit configuration address (including bus, device, function, and register offset) and 0x0CFC for data transfer—while using address bit mapping to select the device's IDSEL line for targeted access. Version 2.0 deprecated Mechanism #2 for new designs, retaining it only for legacy compatibility using a system-defined I/O in the range 0xC000h-0xCFFFh (or equivalent). Mechanism #1 remains the standard for auto-configuration in subsequent revisions. Central to device identification are standardized registers in the first 64 bytes of the configuration space header (offsets 00h-3Fh). The 16-bit Vendor ID at offset 00h uniquely identifies the manufacturer (e.g., 0x8086 for Intel), and a value of 0xFFFF indicates no device is present, allowing the scan to skip empty slots. The adjacent 16-bit Device ID at 02h specifies the exact product variant. The 8-bit Revision ID at offset 08h, and the 24-bit Class Code at offsets 09h-0Bh (programming interface at 09h, subclass at 0Ah, base class at 0Bh) defines the device's functional category, such as 0x010000 for SCSI controllers or 0x020000 for Ethernet adapters, enabling software to recognize and load appropriate drivers. These fields, read early in the scan, confirm device presence and type before proceeding to resource setup. Resource allocation follows detection and relies on the six Base Address Registers (BARs) at offsets 10h-24h in the configuration header, which describe the device's memory or I/O space needs. To determine requirements, software writes 0xFFFFFFFF to a BAR and reads back the value, where inverted bits reveal the alignment and size (e.g., low bits cleared to 0 indicate I/O space, while bit 2 distinguishes 32-bit from 64-bit addressing). The BIOS or OS then allocates non-overlapping base addresses—writing them back to the BARs—for memory regions, I/O ports, and expansion ROM, ensuring devices can map to the host's address space. Interrupt resources are assigned similarly via the Interrupt Pin and Line registers, integrating with broader interrupt handling mechanisms. This allocation completes device enablement by setting the Command register bits for bus mastership, memory/I/O access, and other functions.

Interrupt Handling

In traditional PCI systems, interrupt requests from peripheral devices are managed using four dedicated signal lines per expansion slot: INTA#, INTB#, INTC#, and INTD#. These lines are optional for devices but provide a standardized mechanism for signaling events to the host processor. The signals operate as level-sensitive , asserted low (active low) using open-drain output buffers, which enables wired-OR sharing among multiple devices connected to the same line without electrical conflicts. The interrupt handling process begins when a device asserts its assigned INTx# line to indicate an event requiring CPU attention. This assertion is routed through PCI bridges or directly to the system's interrupt controller, such as the Intel 8259 Programmable Interrupt Controller (PIC) or (APIC), where it is mapped to a specific system IRQ line based on configuration space settings established during the auto-configuration process. The interrupt controller then notifies the CPU, which suspends its current execution, saves the context, and vectors to the corresponding interrupt service routine (ISR) via the . Since the interrupts are level-sensitive, the device must deassert the INTx# line only after the ISR has serviced the request to avoid continuous triggering; shared lines require all asserting devices to deassert before the interrupt can be cleared. In multi-slot or hierarchical PCI topologies, interrupt lines are routed via PCI-to-PCI bridges, which typically remap downstream INTx# signals to upstream lines using a rotational offset (e.g., INTA# from a downstream device may map to INTD# on the bridge) to balance load and enable sharing across segments. This routing ensures scalability in systems with multiple buses while maintaining compatibility. To address limitations of pin-based interrupts, such as the fixed number of lines and sharing overhead, (MSI) were introduced as an optional feature in Revision 2.2 of the PCI Local Bus Specification. With MSI, a device signals an by issuing a dedicated memory write transaction to a locally assigned address and data value, rather than asserting a physical pin; this write is treated as a posted transaction and routed through the PCI fabric to the controller. MSI supports up to 32 vectors per device (using a 16-bit message data field) and employs edge semantics, where each write is a distinct event without requiring deassertion, enhancing efficiency in high-device-density environments. Configuration occurs via capability structures in the device's , where the system allocates the target address during initialization. Interrupt signaling in PCI operates independently of bus arbitration for data transactions; while devices compete for bus mastery using separate REQ# and GNT# signals, interrupt assertion on INTx# lines or MSI writes can occur concurrently without requiring bus ownership. This separation allows low-latency event notification even when the bus is occupied by other operations.

Bus Architecture and Operations

Address Spaces and Memory Mapping

The PCI bus utilizes three primary address spaces to enable host-to-device communication: the configuration space, the I/O space, and the memory space. The configuration space is a per-function register space limited to 256 bytes, accessed through specialized mechanisms distinct from standard I/O or memory transactions, allowing enumeration and setup of devices during system initialization. The I/O space provides a flat addressing model for legacy device control, supporting either a 16-bit range (up to 64 KB total) or a 32-bit extension (up to 4 GB), depending on the host bridge implementation. In contrast, the memory space facilitates memory-mapped I/O operations, offering a 32-bit range by default (up to 4 GB) with optional 64-bit extensions for larger systems. Device memory mapping is managed through Base Address Registers (BARs) located in the configuration space header (offsets 0x10 to 0x24 for standard devices), where each BAR specifies the type, size, and location of the device's addressable regions. During enumeration, the operating system probes each BAR by writing all 1s to it and reading back the value; the fixed bits (typically low-order) that remain 0 indicate the device's requested region size, which must be a power of 2 (e.g., 4 KB, 16 KB, 1 MB, or up to 2 GB per BAR). The OS then assigns non-overlapping base addresses from the available I/O or , writing these values back to the BARs to map the device's registers or buffers into the system's address map, ensuring isolation and avoiding conflicts across multiple devices. Within the memory space, BARs distinguish between prefetchable and non-prefetchable regions to optimize . A prefetchable BAR (indicated by bit 3 set in the BAR) denotes a region without read side effects, allowing the host CPU or bridges to perform speculative burst reads across 4 KB boundaries and cache line alignments without risking or unnecessary stops, which enhances throughput for sequential access patterns like DMA transfers. Non-prefetchable regions (bit 3 clear) are used for areas with potential side effects on reads, such as control registers, and restrict prefetching to prevent errors, though they may incur higher latency due to aligned access requirements. For systems exceeding 4 GB of addressable , PCI supports 64-bit addressing through extensions in the memory space. A 64-bit BAR is signaled by setting bits [2:1] to 10b in the lower BAR, consuming two consecutive 32-bit BARs: the first holds the lower 32 bits of the base address, while the second provides the upper 32 bits (MAB[63:32]). Transactions targeting these addresses employ a dual-address cycle mechanism, where the high 32 bits are transferred in the first address phase followed by the low 32 bits in the second, enabling devices to respond to addresses beyond the 32-bit limit while maintaining compatibility with legacy 32-bit systems. This extension is particularly vital for prefetchable regions in high-memory environments, as it allows mapping large device buffers without fragmentation.

Command Codes and Transaction Types

In the PCI bus protocol, bus commands are encoded on the C/BE[3:0]# lines during the address phase to specify the type of transaction a master device intends to perform. These four-bit encodings allow for 16 possible commands, though some are reserved or specific to extensions. The primary commands include Interrupt Acknowledge (0000), Special Cycle (0001), I/O Read (0010), I/O Write (0011), Memory Read (0110), Memory Write (0111), Configuration Read (1010), and Configuration Write (1011), with additional memory-related variants such as Memory Read Multiple (1100), Dual Address Cycle (1101), Memory Read Line (1110), and Memory Write and Invalidate (1111).
CommandEncoding (C/BE[3:0]#)Description
Interrupt Acknowledge0000Master reads interrupt vector from an interrupting device; implicitly addressed to interrupt controller.
Special Cycle0001Broadcast message to all agents on the bus, without a target response; used for system-wide signals like shutdown.
I/O Read0010Master reads from I/O space; supports single or burst transfers, non-posted to ensure completion acknowledgment.
I/O Write0011Master writes to I/O space; non-posted, requiring target acknowledgment before completion.
Reserved0100Not used in standard PCI.
Memory Read0110Master reads from memory space; supports single or burst transfers, targeting specific address spaces like system or expansion ROM.
Memory Write0111Master writes to memory space; posted, allowing the master to proceed without waiting for target acknowledgment to improve performance.
Reserved1000Not used in standard PCI.
Configuration Read1010Master reads from a device's configuration space for initialization; uses Type 0 or Type 1 addressing.
Configuration Write1011Master writes to a device's configuration space; non-posted.
Memory Read Multiple1100Optimized memory read supporting cache-line bursts across multiple cache lines.
Dual Address Cycle1101Precedes a 64-bit address transaction for 64-bit addressing support.
Memory Read Line1110Memory read optimized for filling a full cache line in a burst.
Memory Write and Invalidate1111Memory write that invalidates cache lines, combining write and coherency operations.
Transaction types in PCI are categorized as reads and writes, with variations for single or burst modes to transfer multiple doublewords efficiently. Reads are generally non-posted, requiring the target to complete data transfer before the master proceeds, while memory writes are posted to decouple the master from target latency, though I/O and configuration transactions remain non-posted for reliability. These commands target distinct address spaces, such as I/O for legacy device control or memory for bulk data access. Masters initiate transactions by asserting the FRAME# signal during the address phase, driving the command on C/BE[3:0]# and the target address on AD[31:0]#, while holding REQ# for arbitration. Targets respond to valid commands by asserting DEVSEL# to indicate readiness, with timing classified as fast (deasserted one or two clock cycles after FRAME#), medium (three cycles), or slow (four or more cycles) to accommodate varying device decoding speeds. Devices must claim transactions matching their enabled address ranges via the configuration space registers, ensuring only the intended target responds. Special cycles differ by not requiring DEVSEL#, as they are broadcasts without a specific target.

Latency Management and Delayed Transactions

In the PCI bus architecture, latency arises primarily from the time required for a target device to respond to an initiator's request, with the specification mandating that the target complete the initial data phase—by asserting TRDY# for ready or STOP# for termination—within 16 clock cycles from the assertion of FRAME#. This response window, often ranging from 7 to 15 cycles in practice depending on device capabilities and bus conditions, ties up the shared bus and reduces overall throughput in multi-device configurations where fast initiators must wait for slower targets. Delayed transactions were introduced in the PCI Local Bus Specification revision 2.1 to mitigate these latency constraints, allowing an initiator to issue a request that the target accepts but cannot immediately fulfill. The initiator then releases the bus after the target signals an incomplete transfer, parking the request internally while the target processes it asynchronously; completion occurs later when the initiator retries the exact same transaction, at which point the target provides the data or acknowledgment without requiring re-decoding of the address or command. The core mechanism for delayed transactions employs the STOP# signal to disconnect the current bus cycle and the DEVSEL# signal to confirm the target's claim of the transaction during the address phase. If DEVSEL# is asserted but no data is transferred (TRDY# remains deasserted), the target issues a retry via STOP#, prompting the initiator to relinquish the bus and attempt completion on a future cycle. This supports delayed read transactions fully and limited delayed write transactions (such as configuration writes), with compatibility for up to 64-bit data widths through PCI's optional 64-bit extensions. By decoupling the request acceptance from immediate completion, delayed transactions enhance system performance by permitting bus reuse for other masters during the target's processing delay, avoiding idle cycles that would otherwise bottleneck the interconnect. This capability is especially vital for PCI bridges interfacing with slower subsystems, such as legacy I/O buses, where native response times exceed PCI's strict initial latency limits of 16 cycles.

Bridge Functionality

PCI-to-PCI Bridges

PCI-to-PCI bridges facilitate the expansion of PCI systems beyond a single bus by establishing connections between a primary bus—typically the one interfacing with the host processor—and a secondary bus, thereby supporting hierarchical topologies that allow for greater device connectivity without overwhelming the main bus. The core function of the bridge involves address translation to map requests from one bus's to the other's, selective forwarding of transactions initiated by masters on either side to appropriate targets on the opposite bus, and traffic isolation to ensure that activities on the secondary bus do not propagate unnecessarily to the primary bus or vice versa, thus preserving and electrical across segments. Central to the bridge's operation are its configuration registers, which include dedicated fields for specifying bus numbers—each an 8-bit value ranging from 0 to 255—for the primary bus, secondary bus, and subordinate bus (the highest-numbered bus downstream, inclusive of subordinates), enabling precise routing of configuration transactions during system initialization. These registers work in conjunction with programmable address windows for memory and I/O spaces, which define the ranges of addresses to be forwarded upstream (from secondary to primary) or downstream (from primary to secondary), allowing the bridge to efficiently direct traffic based on decoded address matches. In handling write transactions, PCI-to-PCI bridges implement support for posted writes, particularly for operations, by queuing the data internally and forwarding it to the target without waiting for an acknowledgment, which prevents the originating master from stalling and assumes reliable eventual delivery to minimize latency in pipelined systems. The bridge's internal mechanism prioritizes requests from the primary bus over those from the secondary bus to favor host-side operations and maintain overall system latency, while also incorporating subtractive decode capability, whereby the bridge claims and forwards transactions whose addresses do not match any positive decode windows on the primary interface to the secondary bus when enabled.

Write Optimization Techniques

PCI bridges employ several techniques to optimize write transactions, particularly for writes, by reducing latency and improving bus across the primary and secondary interfaces. A key method is the use of posted writes, where a bridge accepts write commands from the primary bus without waiting for acknowledgment from the target on the secondary bus. This posting buffers the write data internally, allowing the initiator on the primary side to proceed immediately, thereby minimizing wait states and enhancing overall system throughput. The PCI Local Bus Specification defines posted writes as applicable to write and write and invalidate commands, enabling bridges to decouple transaction completion signals between buses. To further streamline posted writes, bridges implement combining, which merges sequential single-dword writes targeting sequential doubleword addresses into a single larger burst transaction on the secondary bus. For instance, two consecutive 32-bit writes to sequential 64-bit aligned addresses can be combined into one 64-bit burst write, reducing the number of bus cycles required and preserving transaction order. This optimization is recommended for bridges handling posted memory writes, as it decreases overhead and boosts bandwidth utilization without altering the semantic outcome of the operations. The specification emphasizes that combining must maintain the order of writes to ensure data integrity. Merging complements combining by consolidating writes to adjacent addresses or partial bytes within a dword into a contiguous burst transaction. Bridges can, for example, merge two 32-bit writes to sequential addresses into a single 64-bit write or fill byte lanes in a dword from multiple partial writes, transforming non-contiguous operations into efficient linear bursts. This technique is particularly beneficial for sequential data transfers, such as graphics or DMA operations, where it minimizes delays and maximizes per transaction. Byte merging, a subset of this process, specifically handles sub-dword writes by assembling them into full dwords before forwarding. The PCI-to-PCI Bridge Specification details how merging preserves address ordering and supports burst modes to optimize secondary bus performance.

Signal Protocol and Timing

Core Bus Signals

The core bus signals of the Peripheral Component Interconnect (PCI) form the electrical interface that enables communication between the host and peripheral devices on the parallel bus. These signals are defined in the PCI Local Bus Specification and are categorized into multiplexed address/data lines, control signals for transaction management, arbitration signals for bus access, and power and status lines. All signals operate synchronously to the PCI clock except for reset and interrupts, with most being tri-state to allow shared bus usage among multiple agents.

Multiplexed Signals

The multiplexed and signals are central to PCI transactions, allowing efficient use of pins by reusing lines for both addressing and transfer. The AD[31:0] lines serve as the bidirectional, tri-state multiplexed and bus, carrying a 32-bit wide during the phase and 32-bit during subsequent phases of a transaction. These signals support burst transfers with one or more phases, where the initiator drives the lines during the and the source (initiator for writes, target for reads) drives during phases. Accompanying the AD lines, the PAR signal provides even parity coverage for the AD[31:0] and C/BE[3:0]# signals during both address and data phases; it is driven by the agent asserting FRAME# for the address phase and by the data source during data phases, ensuring through parity checking. The C/BE[3:0]# (Command/Byte Enable) lines, also bidirectional and tri-state with active-low assertion, encode the transaction command (such as memory read, I/O write, or configuration access) during the address phase and specify which byte lanes are active during data phases, enabling partial bus width usage for efficiency.

Control Signals

Control signals manage the timing and flow of individual transactions on the PCI bus. The FRAME# signal, driven by the initiator and active low, delineates the start and duration of a transaction: it is asserted to begin the address phase and deasserted after the final data transfer to signal completion or early termination. The IRDY# (Initiator Ready) signal, active low and driven by the current bus master, indicates when valid data or address is present on the AD lines; it is asserted one clock after FRAME# and remains asserted until the transaction completes, allowing the initiator to control data transfer pacing. Complementing this, TRDY# (Target Ready), also active low and driven by the target device, signals that the target is ready to accept or provide data, enabling wait states if necessary without halting the bus. The DEVSEL# (Device Select) signal, active low and driven by the target, acknowledges selection by asserting within a specified number of clocks after the address phase, confirming the device has decoded the transaction as intended for it; fast, medium, or slow timings are supported to accommodate varying device latencies. Similarly, STOP# , active low and target-driven, requests the initiator to halt the current transaction, either for retry (due to errors) or disconnect (to free the bus), preventing bus lockup in conditions.

Arbitration Signals

Arbitration signals facilitate fair access to the shared bus among multiple potential masters. Each PCI device has a dedicated REQ# (Request) line, active low and open-drain, which the device asserts to signal its intent to become the bus master; the central monitors these to grant access. The corresponding GNT# (Grant) line per device, active low and driven solely by the host , indicates permission to use the bus, asserted when no other master is active. These point-to-point signals support centralized , with parking allowing the current master to retain access if to minimize latency. The CLK (Clock) signal provides the synchronous timing reference for all PCI operations, distributed to every device as a free-running input at 33 MHz (or 66 MHz in optional modes), with all other signals sampled or driven on the rising edge except asynchronous reset. The RST# (Reset) signal, active low and asynchronous, initializes all PCI devices upon system power-up or reset, holding for at least 1 ms and ensuring all outputs are tri-stated and configuration registers cleared.

Power and Status Signals

Power and ground signals ensure reliable operation, with VCC supplying 5 V (or 3.3 V in later variants) to devices and GND providing the reference ground, supporting up to 25 W per slot. Interrupt signals INTA# through INTD#, active low and open-drain, allow up to four interrupts per device function, shared across slots with level-sensitive assertion; these are used for legacy interrupt delivery to the host processor. The SERR# (System Error) signal, active low and open-drain, reports critical errors such as parity failures or parity issues not covered by per-transaction parity, allowing system-wide signaling across the bus. The PERR# (Parity Error) signal, active low and open-drain, reports data parity errors detected during or data phases, asserted by the affected agent two clock cycles after the . For 64-bit PCI extensions, additional signals like AD[63:32], C/BE[7:4]#, and PAR64 expand the bus width while maintaining compatibility with 32-bit modes.

Arbitration and Access Control

In PCI, bus access is managed through a centralized scheme implemented by the host controller or a dedicated arbiter, which grants control to bus masters via individual GNT# (Grant) lines, with one line per potential master device to ensure dedicated signaling. This approach allows multiple devices to compete for the bus without centralized contention beyond the arbiter itself, supporting up to 16 or more masters depending on system design. The algorithm is not strictly defined in the specification but typically employs methods, such as round-robin, to balance access among requesters while incorporating a parking mechanism that defaults the bus grant to the last active master when no other requests are pending, thereby minimizing latency for subsequent transactions from the same device. The request process begins when a bus master requiring access asserts its REQ# (Request) signal while the bus is idle or during an ongoing transaction from another master, as arbitration overlaps with data phases to hide latency. Upon detecting the request, the arbiter evaluates priorities and, if granting access, first deasserts the current GNT# line (if active) to release the bus, followed by a single-clock turnaround cycle to prevent signal contention and ensure stable voltage levels before asserting the new GNT# for the requesting master. The master samples its GNT# on the rising clock edge and, upon assertion, may initiate a transaction after one additional clock if the bus is idle; this overlapped arbitration enables efficient bus utilization without dedicated idle cycles for granting. To tolerate varying arbitration delays, the PCI protocol allows up to 16 clock cycles between a master's REQ# assertion and the corresponding GNT# assertion, accommodating complex arbiter decisions in multi-master environments while maintaining overall low latency. Additionally, each master's configuration space includes a programmable Latency Timer register, which counts bus clocks during ownership and forces the master to release the bus (by deasserting FRAME#) once the timer expires if other REQ# signals are pending, thereby preventing any single device from monopolizing the bus and ensuring equitable access. The REQ# and GNT# signals, as core PCI bus lines, facilitate this point-to-point communication between each master and the arbiter. Multi-function PCI devices, which integrate multiple independent functions on a single chip or card, utilize a shared REQ#/GNT# pair across all functions to present only one electrical load and arbitration interface to the bus, simplifying wiring and arbiter complexity while requiring internal coordination among functions for request prioritization. This shared-pair design ensures that the device as a whole competes as a single master, with the internal logic arbitrating among its functions before asserting REQ#.

Address and Data Phases

In PCI bus transactions, the process begins with an address phase followed by one or more data phases, enabling efficient multiplexed transfer of addressing and payload information on the shared lines. The address phase spans exactly one clock cycle in standard 32-bit operations, during which the initiator places the target on the AD[31:0] lines, encodes the transaction command on the C/BE#[3:0] lines, and asserts FRAME# low to signal the transaction's initiation to all potential targets on the bus. This command on C/BE#[3:0] during the address phase indicates the operation type, such as a memory read or I/O write. For transactions requiring 64-bit addressing—specifically memory reads or writes—a dual-cycle address phase extends the duration to two clock cycles to accommodate the full address width on the 32-bit AD bus. In this dual-cycle mode, the first clock drives the upper 32 address bits (A[63:32]) on AD[31:0] along with a command encoding that signals the dual nature (via the M-bit in C/BE#), while the second clock drives the lower 32 address bits (A[31:0]) on AD[31:0] with C/BE#[3:0] set to all ones to indicate full address validity. Following the address phase(s), data phases commence with FRAME# remaining asserted through the first data phase and then deasserted after the final one, allowing for variable-length transfers in burst mode to optimize bus utilization for sequential accesses. Each data phase operates via a target-initiated ready handshake: the initiator asserts IRDY# low when its data is stable and ready for transfer (for writes) or when it is prepared to latch incoming data (for reads), while the target asserts TRDY# low to confirm its readiness, with an actual data transfer occurring only when both signals are low during a clock edge. During write data phases, the C/BE#[3:0] lines function as byte enables, each bit qualifying the corresponding byte on AD[31:0] (e.g., C/BE# low enables the least significant byte), permitting partial-word writes without affecting unselected bytes. For read data phases, C/BE#[3:0] are ignored by the initiator, though the target may drive them for parity or other optional uses. Accesses to a PCI device's 256-byte configuration space, which holds registers for device identification, capabilities, and base addresses, employ specialized addressing distinct from or I/O spaces. In the original Type 00 configuration mechanism, suitable for single-bus systems, the initiator issues a configuration read or write command with the bus number zero, device select bits in AD[31:11] (bits [10:8] for function, [7:2] for device), and asserts the device's dedicated IDSEL# pin to decode the target, effectively mapping the configuration space at a bus-relative address. The enhanced Type 01 configuration access, introduced for multi-bus hierarchies with bridges, encodes the full bus number (AD[31:20]), device number (AD[19:15]), and function number (AD[14:12]) in the address during the cycle, allowing transparent routing across PCI-to-PCI bridges without relying on IDSEL# pins.

Transaction Termination and Burst Modes

In PCI, transaction termination is managed through specific signal interactions during the data phase to ensure orderly completion or interruption of bus activity. Normal termination occurs after the final data transfer, where the initiator deasserts FRAME# to signal no further phases while keeping IRDY# asserted for writes or after the target provides the last data for reads. The target responds by asserting TRDY# to acknowledge receipt of the last doubleword, after which both parties deassert their respective control signals—FRAME#, IRDY#, and TRDY#—to return the bus to an idle state. This process prevents bus contention and allows immediate for the next transaction. Initiators terminate burst transactions without errors by controlling the length of the transfer, deasserting FRAME# only after the desired number of data phases while ensuring the target has not asserted STOP#. This mechanism supports efficient burst extensions, where addresses increment linearly by one doubleword (four bytes) per phase, enabling sequential accesses without unnecessary single-cycle overhead. The initiator must complete each data phase within 16 clock cycles to avoid timeout errors, maintaining bus efficiency for multi-doubleword transfers. Targets handle burst termination to manage resource constraints or s, using the STOP# signal in combination with DEVSEL#, TRDY#, and FRAME#. A disconnect without —suitable for buffer limitations during bursts—occurs when the target deasserts TRDY# and asserts STOP# in a data phase, prompting the initiator to complete the current phase and release the bus, with the transaction eligible for . A target-abort signals an unrecoverable by asserting STOP# without DEVSEL# assertion or with specific timing, immediately halting the transaction and reporting the fault via status registers. For delayed transactions (retry mode), the target asserts STOP# while DEVSEL# is active, causing the initiator to deassert FRAME# promptly and queue the request for later completion, avoiding bus deadlock in latency-sensitive scenarios. Burst addressing in PCI optimizes sequential transfers, with modes defined by the initiator during the address phase using AD[1:0] encoding for memory read and write commands. Linear mode employs incrementing addressing, advancing the address by one doubleword (four bytes) per phase without boundary restrictions or wrapping, which is mandatory for all targets and supports bursts across cache line boundaries. The wrap (modulo) mode, optional for targets, causes the address to loop back to the start of the cache line after reaching its boundary (e.g., after four doublewords for a 16-byte minimum cache line), facilitating efficient cache line fills. Wrap mode is limited to a maximum of four doublewords to align with the smallest supported cache line size, ensuring compatibility across systems. These termination and burst mechanisms integrate with data phase handshakes, where IRDY# and TRDY# synchronize transfers before any termination signals are applied.

64-Bit Extensions and Parity

The 64-bit PCI extension provides an optional enhancement to the standard 32-bit bus, enabling higher bandwidth through wider data paths for both addresses and data transfers. Defined in the PCI Local Bus Specification, this feature adds 64 pins to the connector—32 on each side—using a dual-edge (or dual-notch) design to distinguish 64-bit slots from 32-bit ones and prevent incompatible insertions. The primary signals include AD[63:32] for the upper 32 address/data lines, C/BE[7:4]# for the corresponding upper byte enables, and PAR64 for parity protection on these lines. Additionally, the M66EN signal indicates support for 66 MHz operation in compatible slots, allowing the bus to run at higher frequencies when all agents support it. Addressing in 64-bit PCI employs a dual address cycle mechanism to handle addresses beyond 32 bits. When a master initiates a transaction requiring a 64-bit , it issues a Dual Address Cycle (DAC) command: the first cycle carries the lower 32 bits (AD[31:0]) with the DAC encoding on C/BE[1:0], followed immediately by the upper 32 bits (AD[63:32]) in the next cycle. To negotiate 64-bit data transfer capability, the master asserts REQ64# low during the address phase if it supports or requires the wider path. The target samples REQ64# and, if compatible, asserts ACK64# low in the following clock cycle to confirm 64-bit operation; otherwise, the transaction defaults to 32-bit mode using only AD[31:0] and C/BE[3:0]. This ensures with 32-bit devices. Parity mechanisms in PCI, including the 64-bit extensions, provide detection for across the bus. Even parity is generated over the 36 bits comprising AD[31:0] and C/BE[3:0] (or the full 72 bits with AD[63:32] and C/BE[7:4]# in 64-bit mode), with the PAR signal (or PAR64 for the upper bits) driven by the asserting agent one clock after the AD and C/BE# signals to allow computation time. All agents must check parity if enabled via configuration; a detected mismatch during a phase prompts the affected agent to assert PERR# low two clocks after the erroneous beat, indicating an or parity that typically triggers retry or handling. For more severe issues, such as parity s in the phase, master aborts, or target aborts, the SERR# signal is asserted by the responsible agent to notify the of unrecoverable conditions, often leading to interrupts or . These signals build on the core parity framework by extending coverage to the additional 64-bit lines. Low-latency optimizations in 64-bit PCI include fast DEVSEL# assertion, where a target capable of rapid decoding can drive DEVSEL# active in the same clock cycle that the address appears on AD lines during read transactions, minimizing master wait states compared to medium (next clock) or slow (two clocks later) timings. Complementing this, fast back-to-back transactions permit the same initiating agent to reuse the bus immediately after transaction termination—without the standard one-clock turnaround delay—provided the new transaction targets a different agent and all downstream devices support it; this reduces idle cycles and boosts efficiency for bursty workloads.

Legacy and Modern Relevance

Obsolete Features

The original PCI Local Bus specification provided optional hardware support for cache snooping to ensure coherency between bus-master initiated writes and the CPU cache, particularly for write-back caching modes. This involved dedicated pins such as SNOOP0# through SNOOP3#, which allowed cache controllers to monitor address and transaction phases on the bus and signal responses like hits to modified lines or backoff requests via the SBO# (snoop backoff) pin during write cycles. The mechanism enabled the PCI bus to notify the CPU cache of potential invalidations or flushes, preventing stale data issues in systems where the CPU cache interfaced directly with the bus. However, with the integration of on-chip L1 caches in processors like the Intel Pentium starting in 1993, external bus snooping became unnecessary for primary cache coherency, as internal cache hierarchies handled it independently; by PCI revision 2.2 in 1998, these pins were designated as obsolete and must be left unconnected in subsequent implementations. Special cycles represented a unique transaction type in early PCI designs, functioning as broadcast operations without a targeted device, intended for system-level signaling such as shutdown events or vendor-specific messages. These cycles used a specific command encoding on the C/BE# lines and propagated across the bus but did not cross PCI-to-PCI bridges, limiting their scope to individual segments. Due to infrequent adoption—stemming from challenges in ensuring compatibility across diverse hardware and software ecosystems—and the availability of more reliable alternatives like configuration space accesses, special cycles saw minimal real-world use and were fully eliminated in the transition to . PCI version 2.2 introduced foundational power management features, defining device states from D0 (fully operational) to D3 (powered off) and supporting suspend/resume operations through registers in the configuration space, allowing software to control power consumption and clock gating for peripherals. These capabilities, outlined in the PCI Power Management Interface Specification 1.0 (released in 1997 and integrated into PCI 2.2), enabled basic energy savings but lacked comprehensive system integration. They were subsequently superseded by the Advanced Configuration and Power Interface (ACPI) standard starting with version 1.0 in 1996, which extended PCI power states into an OS-managed framework for coordinated device and platform control, rendering the original PCI mechanisms redundant for modern systems. Early PCI interrupt handling relied on wired-OR signaling across four shared pins (INTA# to INTD#), where devices asserted low to request service, and the bus logic used serial enumeration or to resolve conflicts in multi-device scenarios. This pin-based approach, while simple, suffered from scalability limitations in high-density configurations, as shared lines increased latency and required centralized routing tables for assignment. To address these issues, (MSI) were introduced as an optional feature in PCI 2.2, enabling devices to generate interrupts via dedicated memory write transactions addressed to an APIC or similar controller, eliminating physical wires and supporting up to 32 vectors per device for better performance in and multi-function setups; consequently, wired-OR interrupts were phased out in favor of MSI (and later MSI-X) for new designs, particularly in dense server and embedded systems.

Transition to Successors

As demands escalated in the late and early , the inherent limitations of PCI's parallel shared-bus architecture became increasingly apparent, prompting the development of a successor. The shared bus design required all devices to compete for access, leading to contention and reduced effective throughput as more peripherals were added; even in its highest configuration of 64-bit width at 66 MHz, PCI delivered a theoretical maximum bandwidth of only 533 MB/s. In contrast, (PCIe) introduced a serial, point-to-point that eliminated bus contention by dedicating dedicated lanes between devices, enabling scalable bandwidth that began at 250 MB/s per lane in its initial version and later reached up to 32 GT/s per lane in advanced implementations. The PCI Special Interest Group (PCI-SIG) formalized this shift by releasing the PCIe 1.0 specification in 2003, marking the official debut of the new standard as a high-speed serial interconnect designed to supplant conventional PCI while maintaining essential compatibility features. To ensure a smooth transition, PCIe incorporated backward compatibility mechanisms at the software level, allowing legacy PCI drivers and configuration software to detect and operate PCIe devices transparently; additionally, hardware bridges such as PCI-to-PCIe converters enabled older PCI cards to function in newer PCIe-based systems by translating signals between the parallel and serial domains. Despite the dominance of PCIe, conventional PCI retains niche relevance in modern contexts, particularly within embedded and industrial systems where legacy hardware compatibility is prioritized over peak performance, such as in control panels, legacy servers, and specialized as of 2025. Adapters that convert PCIe slots to PCI interfaces further support this persistence, allowing integration of older expansion cards in contemporary setups without full system overhauls. The era of active PCI development effectively concluded with the release of the PCI Local Bus Specification Revision 3.0 in February 2004, which removed support for 5V signaling while maintaining 3.3V compatibility, after which no further updates were issued by , signaling a strategic pivot to PCIe. In consumer personal computers, full migration to PCIe occurred by the mid-2010s, as motherboard manufacturers phased out PCI slots in favor of the more efficient standard, with notably dropping native PCI support in its chipsets around 2010.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.