Recent from talks
Nothing was collected or created yet.
| Designer | Intel, AMD |
|---|---|
| Bits | 16-bit, 32-bit and 64-bit |
| Introduced | 1978 (16-bit), 1985 (32-bit), 2003 (64-bit) |
| Design | CISC |
| Type | Register–memory |
| Encoding | Variable (1 to 15 bytes) |
| Branching | Condition code |
| Endianness | Little |
| Page size | 8086–i286: None i386, i486: 4 KB pages P5 Pentium: added 4 MB pages (Legacy PAE: 4 KB→2 MB) x86-64: added 1 GB pages |
| Extensions | x87, IA-32, x86-64, MMX, 3DNow!, SSE, MCA, ACPI, SSE2, NX bit, SMT, SSE3, SSSE3, SSE4, SSE4.2, AES-NI, CLMUL, SM3, SM4, RDRAND, SHA, MPX, SME, SGX, XOP, F16C, ADX, BMI, FMA, AVX, AVX2, AVX-VNNI, AVX512, AVX10, AMX, VT-x, VT-d, AMD-V, AMD-Vi, TSX, ASF, TXT, APX |
| Open | Mixed |
| Registers | |
| General-purpose |
|
| Floating-point | |
x86 (also known as 80x86[1] or the 8086 family)[2] is a family of complex instruction set computer (CISC) instruction set architectures[a] initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".
The term is not synonymous with IBM PC compatibility, as this implies a multitude of other computer hardware. Embedded systems and general-purpose computers used x86 chips before the PC-compatible market started,[b] some of them before the IBM PC (1981) debut.
As of June 2022[update], most desktop and laptop computers sold are based on the x86 architecture family,[3] while mobile categories such as smartphones or tablets are dominated by ARM. At the high end, x86 continues to dominate computation-intensive workstation and cloud computing segments.[4]
Overview
[edit]In the 1980s and early 1990s, when the 8088 and 80286 were still in common use, the term x86 usually represented any 8086-compatible CPU. Today, however, x86 usually implies binary compatibility with the 32-bit instruction set of the 80386. This is due to the fact that this instruction set has become something of a lowest common denominator for many modern operating systems and also probably because the term became common after the introduction of the 80386 in 1985.
A few years after the introduction of the 8086 and 8088, Intel added some complexity to its naming scheme and terminology as the "iAPX" of the ambitious but ill-fated Intel iAPX 432 processor was tried on the more successful 8086 family of chips,[c] applied as a kind of system-level prefix. An 8086 system, including coprocessors such as 8087 and 8089, and simpler Intel-specific system chips,[d] was thereby described as an iAPX 86 system.[5][e] There were also terms iRMX (for operating systems), iSBC (for single-board computers), and iSBX (for multimodule boards based on the 8086 architecture), all together under the heading Microsystem 80.[6][7] However, this naming scheme was quite temporary, lasting for a few years during the early 1980s.[f]
Although the 8086 was primarily developed for embedded systems and small multi-user or single-user computers, largely as a response to the successful 8080-compatible Zilog Z80,[8] the x86 line soon grew in features and processing power. Today, x86 is ubiquitous in both stationary and portable personal computers, and is also used in midrange computers, workstations, servers, and most new supercomputer clusters of the TOP500 list. A large amount of software, including a large list of x86 operating systems are using x86-based hardware.
Modern x86 is relatively uncommon in embedded systems, however; small low power applications (using tiny batteries), and low-cost microprocessor markets, such as home appliances and toys, lack significant x86 presence.[g] Simple 8- and 16-bit based architectures are common here, as well as simpler RISC architectures like ARM and RISC-V, although the x86-compatible VIA C7, VIA Nano, AMD's Geode, Athlon Neo and Intel Atom are examples of 32- and 64-bit designs used in some relatively low-power and low-cost segments.
There have been several attempts, including by Intel, to end the market dominance of the "inelegant" x86 architecture designed directly from the first simple 8-bit microprocessors. Examples of this are the iAPX 432 (a project originally named the Intel 8800[9]), the Intel 960, Intel 860 and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitectures, circuitry and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD's 64-bit extension of x86 (which Intel eventually responded to with a compatible design)[10] and the scalability of x86 chips in the form of modern multi-core CPUs, is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from completely new architectures.[11]
For some advanced features, x86 may require a license from Intel, though some do not need it;[citation needed] x86-64 may require an additional license from AMD. The Pentium Pro processor (and NetBurst) has been on the market for more than 21 years[12] and so cannot be subject to patent claims. The i686 subset of the x86 architecture is therefore fully open. The Opteron 1000 series processors have been on the market for more than 21 years[13] and so cannot be subject to patent claims. The AMD K8 subset of the x86 architecture is therefore fully open.
Chronology
[edit]This section needs additional citations for verification. (March 2020) |
The table below lists processor models and model series implementing various architectures in the x86 family, in chronological order. Each line item is characterized by significantly improved or commercially successful processor microarchitecture designs.
| Era | Introduction | Prominent CPU models | Address space | Notable features | |||
|---|---|---|---|---|---|---|---|
| Linear | Virtual | Physical | |||||
| x86-16 | 1st | 1978 | Intel 8086, Intel 8088 (1979) | 16-bit | NA | 20-bit | 16-bit ISA, IBM PC (8088), IBM PC/XT (8088) |
| 1982 | Intel 80186, Intel 80188 NEC V20/V30 (1983) |
8086-2 ISA, embedded (80186/80188) | |||||
| 2nd | Intel 80286 and clones | 30-bit | 24-bit | protected mode, IBM PC/XT 286, IBM PC/AT | |||
| IA-32 | 3rd | 1985 | Intel 80386, AMD Am386 (1991) | 32-bit | 46-bit | 32-bit | 32-bit ISA, paging, IBM PS/2 |
| 4th (pipelining, cache) | 1989 | Intel 80486 Cyrix Cx486S, DLC (1992) AMD Am486 (1993), Am5x86 (1995) |
pipelining, on-die x87 FPU (486DX), on-die cache | ||||
| 5th (Superscalar) |
1993 | Intel Pentium, Pentium MMX (1996) | Superscalar, 64-bit databus, faster FPU, MMX (Pentium MMX), APIC, SMP | ||||
| 1994 | NexGen Nx586 AMD 5k86/K5 (1996) |
Discrete microarchitecture (μ-op translation) | |||||
| 1995 | Cyrix Cx5x86 Cyrix 6x86/MX (1997)/MII (1998) |
dynamic execution | |||||
| 6th (PAE, μ-op translation) |
1995 | Intel Pentium Pro | 36-bit (PAE) | μ-op translation, conditional move instructions, dynamic execution, speculative execution, 3-way x86 superscalar, superscalar FPU, PAE, on-chip L2 cache | |||
| 1997 | Intel Pentium II, Pentium III (1999) Celeron (1998), Xeon (1998) |
on-package (Pentium II) or on-die (Celeron) L2 Cache, SSE (Pentium III), Slot 1, Socket 370 or Slot 2 (Xeon) | |||||
| 1997 | AMD K6/K6-2 (1998)/K6-III (1999) | 32-bit | 3DNow!, 3-level cache system (K6-III) | ||||
| Enhanced Platform | 1999 | AMD Athlon Athlon XP/MP (2001) Duron (2000) Sempron (2004) |
36-bit | MMX+, 3DNow!+, double-pumped bus, Slot A or Socket A | |||
| 2000 | Transmeta Crusoe | 32-bit | CMS powered x86 platform processor, VLIW-128 core, on-die memory controller, on-die PCI bridge logic | ||||
| Intel Pentium 4 | 36-bit | SSE2, HTT (Northwood), NetBurst, quad-pumped bus, Trace Cache, Socket 478 | |||||
| 2003 | Intel Pentium M Intel Core (2006) Pentium Dual-Core (2007) |
μ-op fusion, XD bit (Dothan) (Intel Core "Yonah") | |||||
| Transmeta Efficeon | CMS 6.0.4, VLIW-256, NX bit, HT | ||||||
| IA-64 | 64-bit Transition 1999–2005 |
2001 | Intel Itanium (2001–2017) | 52-bit |
64-bit EPIC architecture, 128-bit VLIW instruction bundle, on-die hardware IA-32 H/W enabling x86 OSes & x86 applications (early generations), software IA-32 EL enabling x86 applications (Itanium 2), Itanium register files are remapped to x86 registers | ||
| x86-64 | 64-bit Extended since 2001 |
x86-64 is the 64-bit extended architecture of x86, its Legacy Mode preserves the entire and unaltered x86 architecture. The native architecture of x86-64 processors: residing in the 64-bit Mode, lacks of access mode in segmentation, presenting 64-bit architectural-permit linear address space; an adapted IA-32 architecture residing in the Compatibility Mode alongside 64-bit Mode is provided to support most x86 applications | |||||
| 2003 | Athlon 64/FX/X2 (2005), Opteron Sempron (2004)/X2 (2008) Turion 64 (2005)/X2 (2006) |
40-bit |
AMD64 (except some Sempron processors presented as purely x86 processors), on-die memory controller, HyperTransport, on-die dual-core (X2), AMD-V (Athlon 64 Orleans), Socket 754/939/940 or AM2 | ||||
| 2004 | Pentium 4 (Prescott) Celeron D, Pentium D (2005) |
36-bit |
EM64T (enabled on selected models of Pentium 4 and Celeron D), SSE3, 2nd gen. NetBurst pipelining, dual-core (on-die: Pentium D 8xx, on-chip: Pentium D 9xx), Intel VT (Pentium 4 6x2), socket LGA 775 | ||||
| 2006 | Intel Core 2 Pentium Dual-Core (2007) Celeron Dual-Core (2008) |
Intel 64 (<<== EM64T), SSSE3 (65 nm), wide dynamic execution, μ-op fusion, macro-op fusion in 16-bit and 32-bit mode,[14][15] on-chip quad-core(Core 2 Quad), Smart Shared L2 Cache (Intel Core 2 "Merom") | |||||
| 2007 | AMD Phenom/II (2008) Athlon II (2009) Turion II (2009) |
48-bit |
Monolithic quad-core (X4)/triple-core (X3), SSE4a, Rapid Virtualization Indexing (RVI), HyperTransport 3, AM2+ or AM3 | ||||
| 2008 | Intel Core 2 (45 nm) | 40-bit |
SSE4.1 | ||||
| Intel Atom | netbook or low power smart device processor, P54C core reused | ||||||
| Intel Core i7 Core i5 (2009) Core i3 (2010) |
QuickPath, on-chip GMCH (Clarkdale), SSE4.2, Extended Page Tables (EPT) for virtualization, macro-op fusion in 64-bit mode,[14][15] (Intel Xeon "Bloomfield" with Nehalem microarchitecture) | ||||||
| VIA Nano | hardware-based encryption; adaptive power management | ||||||
| 2010 | AMD FX | 48-bit |
octa-core, CMT(Clustered Multi-Thread), FMA, OpenCL, AM3+ | ||||
| 2011 | AMD APU A and E Series (Llano) | 40-bit |
on-die GPGPU, PCI Express 2.0, Socket FM1 | ||||
| AMD APU C, E and Z Series (Bobcat) | 36-bit |
low power smart device APU | |||||
| Intel Core i3, Core i5 and Core i7 (Sandy Bridge/Ivy Bridge) |
Internal Ring connection, decoded μ-op cache, LGA 1155 socket | ||||||
| 2012 | AMD APU A Series (Bulldozer, Trinity and later) | 48-bit |
AVX, Bulldozer-based APU, Socket FM2 or Socket FM2+ | ||||
| Intel Xeon Phi (Knights Corner) | PCI-E add-on card coprocessor for XEON based system, Manycore Chip, In-order P54C, very wide VPU (512-bit SSE), LRBni instructions (8× 64-bit) | ||||||
| 2013 | AMD Jaguar (Athlon, Sempron) |
SoC, game console and low power smart device processor | |||||
| Intel Silvermont (Atom, Celeron, Pentium) |
36-bit |
SoC, low/ultra-low power smart device processor | |||||
| Intel Core i3, Core i5 and Core i7 (Haswell/Broadwell) | 39-bit |
AVX2, FMA3, TSX, BMI1, and BMI2 instructions, LGA 1150 socket | |||||
| 2015 | Intel Broadwell-U (Intel Core i3, Core i5, Core i7, Core M, Pentium, Celeron) |
SoC, on-chip Broadwell-U PCH-LP (Multi-chip module) | |||||
| 2015–2020 | Intel Skylake/Kaby Lake/Cannon Lake/Coffee Lake/Rocket Lake (Intel Pentium/Celeron Gold, Core i3, Core i5, Core i7, Core i9) |
46-bit |
AVX-512 (restricted to Cannon Lake-U and workstation/server variants of Skylake) | ||||
| 2016 | Intel Xeon Phi (Knights Landing) | 48-bit |
Manycore CPU and coprocessor for Xeon systems, Airmont (Atom) based core | ||||
| 2016 | AMD Bristol Ridge (AMD (Pro) A6/A8/A10/A12) |
Integrated FCH on die, SoC, AM4 socket | |||||
| 2017 | AMD Ryzen Series/AMD Epyc Series | AMD's implementation of SMT, on-chip multiple dies | |||||
| 2017 | Zhaoxin WuDaoKou (KX-5000, KH-20000) | Zhaoxin's first brand new x86-64 architecture | |||||
| 2018–2021 | Intel Sunny Cove (Ice Lake-U and Y), Cypress Cove (Rocket Lake) | 57-bit |
Intel's first implementation of AVX-512 for the consumer segment. Addition of Vector Neural Network Instructions (VNNI) | ||||
| 2019 | AMD Matisse | 48-bit
|
Multiple Chip Module design with I/O die separate from CPU die(s), Support for PCIe Gen4 | ||||
| 2020 | Intel Willow Cove (Tiger Lake-Y/U/H) | 57-bit |
Dual ring interconnect architecture, updated Gaussian Neural Accelerator (GNA2), new AVX-512 Vector Intersection Instructions, addition of Control-Flow Enforcement Technology (CET) | ||||
| 2021 | Intel Alder Lake | Hybrid design with performance (Golden Cove) and efficiency cores (Gracemont), support for PCIe Gen5 and DDR5, updated Gaussian Neural Accelerator (GNA3). AVX-512 not officially supported | |||||
| 2022 | AMD Vermeer (5800X3D) | 48-bit
|
X3D chips have an additional 64MB 3D vertically stacked L3 cache (3D V-Cache) for up to 96MB L3 Cache | ||||
| 2022 | AMD Raphael | AMD's first implementation of AVX-512 for the consumer segment, iGPU now standard on Ryzen CPU's with 2 RDNA 2 compute cores | |||||
History
[edit]Designers and manufacturers
[edit]
At various times, companies such as IBM, VIA, NEC,[h] AMD, TI, STM, Fujitsu, OKI, Siemens, Cyrix, Intersil, C&T, NexGen, UMC, and DM&P started to design or manufacture[i] x86 processors (CPUs) intended for personal computers and embedded systems. Other companies that designed or manufactured x86 or x87 processors include ITT Corporation, National Semiconductor, ULSI System Technology, and Weitek.
Such x86 implementations were seldom simple copies but often employed different internal microarchitectures and different solutions at the electronic and physical levels. Quite naturally, early compatible microprocessors were 16-bit, while 32-bit designs were developed much later. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors, often named similarly to Intel's original chips.
After the fully pipelined i486, in 1993 Intel introduced the Pentium brand name (which, unlike numbers, could be trademarked) for their new set of superscalar x86 designs. With the x86 naming scheme now legally cleared, other x86 vendors had to choose different names for their x86-compatible products, and initially some chose to continue with variations of the numbering scheme: IBM partnered with Cyrix to produce the 5x86 and then the very efficient 6x86 (M1) and 6x86MX (MII) lines of Cyrix designs, which were the first x86 microprocessors implementing register renaming to enable speculative execution.
AMD meanwhile designed and manufactured the advanced but delayed 5k86 (K5), which, internally, was closely based on AMD's earlier 29K RISC design; similar to NexGen's Nx586, it used a strategy such that dedicated pipeline stages decode x86 instructions into uniform and easily handled micro-operations, a method that has remained the basis for most x86 designs to this day.
Some early versions of these microprocessors had heat dissipation problems. The 6x86 was also affected by a few minor compatibility problems, the Nx586 lacked a floating-point unit (FPU) and (the then crucial) pin-compatibility, while the K5 had somewhat disappointing performance when it was (eventually) introduced.
Customer ignorance of alternatives to the Pentium series further contributed to these designs being comparatively unsuccessful, despite the fact that the K5 had very good Pentium compatibility and the 6x86 was significantly faster than the Pentium on integer code.[j] AMD later managed to grow into a serious contender with the K6 set of processors, which gave way to the very successful Athlon and Opteron.
There were also other contenders, such as Centaur Technology (formerly IDT), Rise Technology, and Transmeta. VIA Technologies' energy efficient C3 and C7 processors, which were designed by the Centaur company, were sold for many years following their release in 2005. Centaur's 2008 design, the VIA Nano, was their first processor with superscalar and speculative execution. It was introduced at about the same time (in 2008) as Intel introduced the Intel Atom, its first "in-order" processor after the P5 Pentium.
Many additions and extensions have been added to the original x86 instruction set over the years, almost consistently with full backward compatibility.[k] The architecture family has been implemented in processors from Intel, Cyrix, AMD, VIA Technologies and many other companies; there are also open implementations, such as the Zet SoC platform (currently inactive).[16] Nevertheless, of those, only Intel, AMD, VIA Technologies, and DM&P Electronics hold x86 architectural licenses, and from these, only the first two actively produce modern 64-bit designs, leading to what has been called a "duopoly" of Intel and AMD in x86 processors.
However, in 2014 the Shanghai-based Chinese company Zhaoxin, a joint venture between a Chinese company and VIA Technologies, began designing VIA based x86 processors for desktops and laptops. The release of its newest "7" family[17] of x86 processors (e.g. KX-7000), which are not quite as fast as AMD or Intel chips but are still state of the art,[18] had been planned for 2021; as of March 2022 the release had not taken place, however.[19]
From 16-bit and 32-bit to 64-bit architecture
[edit]The instruction set architecture has twice been extended to a larger word size. In 1985, Intel released the 32-bit 80386 (later known as i386) which gradually replaced the earlier 16-bit chips in computers (although typically not in embedded systems) during the following years; this extended programming model was originally referred to as the i386 architecture (like its first implementation) but Intel later dubbed it IA-32 when introducing its (unrelated) IA-64 architecture.
In 1999–2003, AMD extended this 32-bit architecture to 64 bits and referred to it as x86-64 in early documents and later as AMD64. Intel soon adopted AMD's architectural extensions under the name IA-32e, later using the name EM64T and finally using Intel 64. Microsoft and Sun Microsystems/Oracle also use term "x64", while many Linux distributions, and the BSDs also use the "amd64" term. Microsoft Windows, for example, designates its 32-bit versions as "x86" and 64-bit versions as "x64", while installation files of 64-bit Windows versions are required to be placed into a directory called "AMD64".[20]
Continued support for 16-bit and 32-bit execution modes
[edit]In 2023, Intel proposed a major change to the architecture referred to as X86S (formerly known as X86-S). The S in X86S stood for "simplification", which aimed to remove support for legacy execution modes and instructions.
The draft specification received multiple updates, reaching version 1.2 by June 2024. It was eventually abandoned as of December 2024, following the formation of the x86 Ecosystem Advisory Group by Intel and AMD.[21]
A processor implementing this proposal would have lacked support for legacy mode, started execution directly in long mode and provided a way to switch to 5-level paging without going through the unpaged mode.
The new architecture would have removed support for 16-bit and 32-bit operating systems. 32-bit code would have only been supported for user applications running in ring 3, and would have used the same simplified segmentation as long mode.[22][23]
Specific removed features would have included:[24]
- Segmentation gates
- 32-bit ring 0
- VT-x will no longer emulate this feature
- Rings 1 and 2
- Ring 3 I/O port (IN/OUT) access; see port-mapped I/O
- String port I/O (INS/OUTS)
- Real mode (including huge real mode), 16-bit protected mode, VM86
- 16-bit addressing mode
- VT-x will no longer provide unrestricted mode
- 8259 support; the only APIC supported would be X2APIC
- Some unused operating system mode bits
- 16-bit and 32-bit Startup IPI (SIPI)
Basic properties of the architecture
[edit]The x86 architecture is a variable instruction length, primarily "CISC" design with emphasis on backward compatibility. The instruction set is not typical CISC, however, but basically an extended version of the simple eight-bit 8008 and 8080 architectures. Byte-addressing is enabled and words are stored in memory with little-endian byte order. Memory access to unaligned addresses is allowed for almost all instructions. The largest native size for integer arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation (newer processors include direct support for smaller integers as well). Multiple scalar values can be handled simultaneously via the SIMD unit present in later generations, as described below.[l] Immediate addressing offsets and immediate data may be expressed as 8-bit quantities for the frequently occurring cases or contexts where a −128..127 range is enough. Typical instructions are therefore 2 or 3 bytes in length (although some are much longer, and some are single-byte).
To further conserve encoding space, most registers are expressed in opcodes using three or four bits, the latter via an opcode prefix in 64-bit mode, while at most one operand to an instruction can be a memory location.[m] However, this memory operand may also be the destination (or a combined source and destination), while the other operand, the source, can be either register or immediate. Among other factors, this contributes to a code size that rivals eight-bit machines and enables efficient use of instruction cache memory. The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses—i.e., a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache.
Floating point and SIMD
[edit]A dedicated floating-point processor with 80-bit internal registers, the 8087, was developed for the original 8086. This microprocessor subsequently developed into the extended 80387, and later processors incorporated a backward compatible version of this functionality on the same microprocessor as the main processor. In addition to this, modern x86 designs also contain a SIMD-unit (see SSE below) where instructions can work in parallel on (one or two) 128-bit words, each containing two or four floating-point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16 integers (each 64, 32, 16 or 8 bits wide respectively).
The presence of wide SIMD registers means that existing x86 processors can load or store up to 128 bits of memory data in a single instruction and also perform bitwise operations (although not integer arithmetic[n]) on full 128-bits quantities in parallel. Intel's Sandy Bridge processors added the Advanced Vector Extensions (AVX) instructions, widening the SIMD registers to 256 bits. The Intel Initial Many Core Instructions implemented by the Knights Corner Xeon Phi processors, and the AVX-512 instructions implemented by the Knights Landing Xeon Phi processors and by Skylake-X processors, use 512-bit wide SIMD registers.
Current implementations
[edit]During execution, current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces called micro-operations. These are then handed to a control unit that buffers and schedules them in compliance with x86-semantics so that they can be executed, partly in parallel, by one of several (more or less specialized) execution units. These modern x86 designs are thus pipelined, superscalar, and also capable of out of order and speculative execution (via branch prediction, register renaming, and memory dependence prediction), which means they may execute multiple (partial or complete) x86 instructions simultaneously, and not necessarily in the same order as given in the instruction stream.[25] Some Intel CPUs (Xeon Foster MP, some Pentium 4, and some Nehalem and later Intel Core processors) and AMD CPUs (starting from Zen) are also capable of simultaneous multithreading with two threads per core (Xeon Phi has four threads per core). Some Intel CPUs support transactional memory (TSX).
When introduced, in the mid-1990s, this method was sometimes referred to as a "RISC core" or as "RISC translation", partly for marketing reasons, but also because these micro-operations share some properties with certain types of RISC instructions. However, traditional microcode (used since the 1950s) also inherently shares many of the same properties; the new method differs mainly in that the translation to micro-operations now occurs asynchronously. Not having to synchronize the execution units with the decode steps opens up possibilities for more analysis of the (buffered) code stream, and therefore permits detection of operations that can be performed in parallel, simultaneously feeding more than one execution unit.
The latest processors also do the opposite when appropriate; they combine certain x86 sequences (such as a compare followed by a conditional jump) into a more complex micro-op which fits the execution model better and thus can be executed faster or with fewer machine resources involved.
Another way to try to improve performance is to cache the decoded micro-operations, so the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. Intel followed this approach with the Execution Trace Cache feature in their NetBurst microarchitecture (for Pentium 4 processors) and later in the Decoded Stream Buffer (for Core-branded processors since Sandy Bridge).[26]
Transmeta used a completely different method in their Crusoe x86 compatible CPUs. They used just-in-time translation to convert x86 instructions to the CPU's native VLIW instruction set. Transmeta argued that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations.
Addressing modes
[edit]Addressing modes for 16-bit processor modes can be summarized by the formula:[27][28]
Addressing modes for 32-bit x86 processor modes[29] can be summarized by the formula:[30]
Addressing modes for the 64-bit processor mode can be summarized by the formula:[30]
Instruction relative addressing in 64-bit code (RIP + displacement, where RIP is the instruction pointer register) simplifies the implementation of position-independent code (as used in shared libraries in some operating systems).[31]
The 8086 had 64 KB of eight-bit (or alternatively 32 K-word of 16-bit) I/O space, and a 64 KB (one segment) stack in memory supported by computer hardware. Only words (two bytes) can be pushed to the stack. The stack grows toward numerically lower addresses, with SS:SP pointing to the most recently pushed item. There are 256 interrupts, which can be invoked by both hardware and software. The interrupts can cascade, using the stack to store the return address.
x86 registers
[edit]16-bit
[edit]The original Intel 8086 and 8088 have fourteen 16-bit registers. Four of them (AX, BX, CX, DX) are general-purpose registers (GPRs), although each may have an additional purpose; for example, only CX can be used as a counter with the loop instruction. Each can be accessed as two separate bytes (thus BX's high byte can be accessed as BH and low byte as BL). Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack, and BP (base pointer) is often used to point at some other place in the stack, typically above the local variables (see frame pointer). The registers SI, DI, BX and BP are address registers, and may also be used for array indexing.
One of four possible 'segment registers' (CS, DS, SS and ES) is used to form a memory address. In the original 8086 / 8088 / 80186 / 80188 every address was built from a segment register and one of the general purpose registers. For example, ds:si is the notation for an address formed as [16 * ds + si] to allow 20-bit addressing rather than 16 bits, although this changed in later processors. At that time only certain combinations were supported.
The FLAGS register contains flags such as carry flag, overflow flag and zero flag. Finally, the instruction pointer (IP) points to the next instruction that will be fetched from memory and then executed; this register cannot be directly accessed (read or written) by a program.[32]
The Intel 80186 and 80188 are essentially an upgraded 8086 or 8088 CPU, respectively, with on-chip peripherals added, and they have the same CPU registers as the 8086 and 8088 (in addition to interface registers for the peripherals).
The 8086, 8088, 80186, and 80188 can use an optional floating-point coprocessor, the 8087. The 8087 appears to the programmer as part of the CPU and adds eight 80-bit wide registers, st(0) to st(7), each of which can hold numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64-bit (binary) integer, and 80-bit packed decimal integer.[7]: S-6, S-13..S-15 It also has its own 16-bit status register accessible through the fstsw instruction, and it is common to simply use some of its bits for branching by copying it into the normal FLAGS.[33]
In the Intel 80286, to support protected mode, three special registers hold descriptor table addresses (GDTR, LDTR, IDTR), and a fourth task register (TR) is used for task switching. The 80287 is the floating-point coprocessor for the 80286 and has the same registers as the 8087 with the same data formats.
32-bit
[edit]
With the advent of the 32-bit 80386 processor, the 16-bit general-purpose registers, base registers, index registers, instruction pointer, and FLAGS register, but not the segment registers, were expanded to 32 bits. The nomenclature represented this by prefixing an "E" (for "extended") to the register names in x86 assembly language. Thus, the AX register corresponds to the lower 16 bits of the new 32-bit EAX register, SI corresponds to the lower 16 bits of ESI, and so on. The general-purpose registers, base registers, and index registers can all be used as the base in addressing modes, and all of those registers except for the stack pointer can be used as the index in addressing modes.
Two new segment registers (FS and GS) were added. With a greater number of registers, instructions and operands, the machine code format was expanded. To provide backward compatibility, segments with executable code can be marked as containing either 16-bit or 32-bit instructions. Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment or vice versa.
The 80386 had an optional floating-point coprocessor, the 80387; it had eight 80-bit wide registers: st(0) to st(7),[34] like the 8087 and 80287. The 80386 could also use an 80287 coprocessor.[35] With the 80486 and all subsequent x86 models, the floating-point processing unit (FPU) is integrated on-chip.
The Pentium MMX added eight 64-bit MMX integer vector registers (MM0 to MM7, which share lower bits with the 80-bit-wide FPU stack).[36] With the Pentium III, Intel added a 32-bit Streaming SIMD Extensions (SSE) control/status register (MXCSR) and eight 128-bit SSE floating-point registers (XMM0 to XMM7).[37]
64-bit
[edit]Starting with the AMD Opteron processor, the x86 architecture extended the 32-bit registers into 64-bit registers in a way similar to how the 16 to 32-bit extension took place. An R-prefix (for "register") identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), and eight additional 64-bit general registers (R8–R15) were also introduced in the creation of x86-64. Also, eight more SSE vector registers (XMM8–XMM15) were added. However, these extensions are only usable in 64-bit mode, which is one of the two modes only available in long mode. The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, virtual addresses are now sign extended to 64 bits (in order to disallow mode bits in virtual addresses), and other selector details were dramatically reduced. In addition, an addressing mode was added to allow memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems.
128-bit
[edit]SIMD registers XMM0–XMM15 (XMM0–XMM31 when AVX-512 is supported).
256-bit
[edit]SIMD registers YMM0–YMM15 (YMM0–YMM31 when AVX-512 is supported). Lower half of each of the YMM registers maps onto the corresponding XMM register.
512-bit
[edit]SIMD registers ZMM0–ZMM31. Lower half of each of the ZMM registers maps onto the corresponding YMM register.
Miscellaneous/special purpose
[edit]x86 processors that have a protected mode, i.e. the 80286 and later processors, also have three descriptor registers (GDTR, LDTR, IDTR) and a task register (TR).
32-bit x86 processors (starting with the 80386) also include various special/miscellaneous registers such as control registers (CR0 through 4, CR8 for 64-bit only), debug registers (DR0 through 3, plus 6 and 7), test registers (TR3 through 7; 80486 only), and model-specific registers (MSRs, appearing with the Pentium[o]).
AVX-512 has eight extra 64-bit mask registers K0–K7 for selecting elements in a vector register. Depending on the vector register and element widths, only a subset of bits of the mask register may be used by a given instruction.
Purpose
[edit]Although the main registers (with the exception of the instruction pointer) are "general-purpose" in the 32-bit and 64-bit versions of the instruction set and can be used for anything, it was originally envisioned that they be used for the following purposes:
- AL/AH/AX/EAX/RAX: Accumulator
- CL/CH/CX/ECX/RCX: Counter (for use with loops and strings)
- DL/DH/DX/EDX/RDX: Extend the precision of the accumulator (e.g. combine 32-bit EAX and EDX for 64-bit integer operations in 32-bit code)
- BL/BH/BX/EBX/RBX: Base index (for use with arrays)
- SP/ESP/RSP: Stack pointer for top address of the stack.
- BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame.
- SI/ESI/RSI: Source index for string operations.
- DI/EDI/RDI: Destination index for string operations.
- IP/EIP/RIP: Instruction pointer. Holds the program counter, the address of next instruction.
Segment registers:
- CS: Code
- DS: Data
- SS: Stack
- ES: Extra data
- FS: Extra data #2
- GS: Extra data #3
No particular purposes were envisioned for the other 8 registers available only in 64-bit mode.
Some instructions compile and execute more efficiently when using these registers for their designed purpose. For example, using AL as an accumulator and adding an immediate byte value to it produces the efficient add to AL opcode of 04h, whilst using the BL register produces the generic and longer add to register opcode of 80C3h. Another example is double precision division and multiplication that works specifically with the AX and DX registers.
Modern compilers benefited from the introduction of the sib byte (scale-index-base byte) that allows registers to be treated uniformly (minicomputer-like). However, using the sib byte universally is non-optimal, as it produces longer encodings than only using it selectively when necessary. (The main benefit of the sib byte is the orthogonality and more powerful addressing modes it provides, which make it possible to save instructions and the use of registers for address calculations such as scaling an index.) Some special instructions lost priority in the hardware design and became slower than equivalent small code sequences. A notable example is the LODSW instruction.
Structure
[edit]| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
|---|---|---|---|---|---|---|---|
| R?X | |||||||
| E?X | |||||||
| ?X | |||||||
| ?H | ?L | ||||||
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
|---|---|---|---|---|---|---|---|
| ? | |||||||
| ?D | |||||||
| ?W | |||||||
| ?B | |||||||
| 16 | 8 |
|---|---|
| ?S | |
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
|---|---|---|---|---|---|---|---|
| R?P | |||||||
| E?P | |||||||
| ?P | |||||||
| ?PL | |||||||
Note: The ?PL registers are only available in 64-bit mode.
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
|---|---|---|---|---|---|---|---|
| R?I | |||||||
| E?I | |||||||
| ?I | |||||||
| ?IL | |||||||
Note: The ?IL registers are only available in 64-bit mode.
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
|---|---|---|---|---|---|---|---|
| RIP | |||||||
| EIP | |||||||
| IP | |||||||
Operating modes
[edit]Real mode
[edit]This section needs additional citations for verification. (January 2014) |
Real Address mode,[38] commonly called Real mode, is an operating mode of 8086 and later x86-compatible CPUs. Real mode is characterized by a 20-bit segmented memory address space (meaning that only slightly more than 1 MiB of memory can be addressed[p]), direct software access to peripheral hardware, and no concept of memory protection or multitasking at the hardware level. All x86 CPUs in the 80286 series and later start up in real mode at power-on; 80186 CPUs and earlier had only one operational mode, which is equivalent to real mode in later chips. (On the IBM PC platform, direct software access to the IBM BIOS routines is available only in real mode, since BIOS is written for real mode. However, this is not a property of the x86 CPU but of the IBM BIOS design.)
In order to use more than 64 KB of memory, the segment registers must be used. This created great complications for compiler implementors who introduced odd pointer modes such as "near", "far" and "huge" to leverage the implicit nature of segmented architecture to different degrees, with some pointers containing 16-bit offsets within implied segments and other pointers containing segment addresses and offsets within segments. It is technically possible to use up to 256 KB of memory for code and data, with up to 64 KB for code, by setting all four segment registers once and then only using 16-bit offsets (optionally with default-segment override prefixes) to address memory, but this puts substantial restrictions on the way data can be addressed and memory operands can be combined, and it violates the architectural intent of the Intel designers, which is for separate data items (e.g. arrays, structures, code units) to be contained in separate segments and addressed by their own segment addresses, in new programs that are not ported from earlier 8-bit processors with 16-bit address spaces.
Unreal mode
[edit]Unreal mode is used by some 16-bit operating systems and some 32-bit boot loaders.
System Management Mode
[edit]The System Management Mode (SMM) is only used by the system firmware (BIOS/UEFI), not by operating systems and applications software. The SMM code is running in SMRAM.
Protected mode
[edit]This section needs additional citations for verification. (January 2014) |
In addition to real mode, the Intel 80286 supports protected mode, expanding addressable physical memory to 16 MB and addressable virtual memory to 1 GB, and providing protected memory, which prevents programs from corrupting one another. This is done by using the segment registers only for storing an index into a descriptor table that is stored in memory. There are two such tables, the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT), each holding up to 8192 segment descriptors, each segment giving access to 64 KB of memory. In the 80286, a segment descriptor provides a 24-bit base address, and this base address is added to a 16-bit offset to create an absolute address. The base address from the table fulfills the same role that the literal value of the segment register fulfills in real mode; the segment registers have been converted from direct registers to indirect registers. Each segment can be assigned one of four ring levels used for hardware-based computer security. Each segment descriptor also contains a segment limit field which specifies the maximum offset that may be used with the segment. Because offsets are 16 bits, segments are still limited to 64 KB each in 80286 protected mode.[39]
Each time a segment register is loaded in protected mode, the 80286 must read a 6-byte segment descriptor from memory into a set of hidden internal registers. Thus, loading segment registers is much slower in protected mode than in real mode, and changing segments very frequently is to be avoided. Actual memory operations using protected mode segments are not slowed much because the 80286 and later have hardware to check the offset against the segment limit in parallel with instruction execution.
The Intel 80386 extended offsets and also the segment limit field in each segment descriptor to 32 bits, enabling a segment to span the entire memory space. It also introduced support in protected mode for paging, a mechanism making it possible to use paged virtual memory (with 4 KB page size). Paging allows the CPU to map any page of the virtual memory space to any page of the physical memory space. To do this, it uses additional mapping tables in memory called page tables. Protected mode on the 80386 can operate with paging either enabled or disabled; the segmentation mechanism is always active and generates virtual addresses that are then mapped by the paging mechanism if it is enabled. The segmentation mechanism can also be effectively disabled by setting all segments to have a base address of 0 and size limit equal to the whole address space; this also requires a minimally-sized segment descriptor table of only four descriptors (since the FS and GS segments need not be used).[q]
Paging is used extensively by modern multitasking operating systems. Linux, 386BSD and Windows NT were developed for the 386 because it was the first Intel architecture CPU to support paging and 32-bit segment offsets. The 386 architecture became the basis of all further development in the x86 series.
x86 processors that support protected mode boot into real mode for backward compatibility with the older 8086 class of processors. Upon power-on (a.k.a. booting), the processor initializes in real mode, and then begins executing instructions. Operating system boot code, which might be stored in read-only memory, may place the processor into the protected mode to enable paging and other features. Conversely, segment arithmetic, a common practice in real mode code, is not allowed in protected mode.
Virtual 8086 mode
[edit]There is also a sub-mode of operation in 32-bit protected mode (a.k.a. 80386 protected mode) called virtual 8086 mode, also known as V86 mode. This is basically a special hybrid operating mode that allows real mode programs and operating systems to run while under the control of a protected mode supervisor operating system. This allows for a great deal of flexibility in running both protected mode programs and real mode programs simultaneously. This mode is exclusively available for the 32-bit version of protected mode; it does not exist in the 16-bit version of protected mode, or in long mode.
Long mode
[edit]In the mid-1990s, it was obvious that the 32-bit address space of the x86 architecture was limiting its performance in applications requiring large data sets. A 32-bit address space would allow the processor to directly address only 4 GB of data, a size surpassed by applications such as video processing and database engines. Using 64-bit addresses, it is possible to directly address 16 EiB of data, although most 64-bit architectures do not support access to the full 64-bit address space; for example, AMD64 supports only 48 bits from a 64-bit address, split into four paging levels.
In 1999, AMD published a (nearly) complete specification for a 64-bit extension of the x86 architecture which they called x86-64 with claimed intentions to produce. That design is currently used in almost all x86 processors, with some exceptions intended for embedded systems.
Mass-produced x86-64 chips for the general market were available four years later, in 2003, after the time was spent for working prototypes to be tested and refined; about the same time, the initial name x86-64 was changed to AMD64. The success of the AMD64 line of processors coupled with lukewarm reception of the IA-64 architecture forced Intel to release its own implementation of the AMD64 instruction set. Intel had previously implemented support for AMD64[40] but opted not to enable it in hopes that AMD would not bring AMD64 to market before Itanium's new IA-64 instruction set was widely adopted. It branded its implementation of AMD64 as EM64T, and later rebranded it Intel 64.
In its literature and product version names, Microsoft and Sun refer to AMD64/Intel 64 collectively as x64 in the Windows and Solaris operating systems. Linux distributions refer to it either as "x86-64", its variant "x86_64", or "amd64". BSD systems use "amd64" while macOS uses "x86_64".
Long mode is mostly an extension of the 32-bit instruction set, but unlike the 16–to–32-bit transition, many instructions were dropped in the 64-bit mode. This does not affect actual binary backward compatibility (which would execute legacy code in other modes that retain support for those instructions), but it changes the way assembler and compilers for new code have to work.
This was the first time that a major extension of the x86 architecture was initiated and originated by a manufacturer other than Intel. It was also the first time that Intel accepted technology of this nature from an outside source.
Extensions
[edit]x87
[edit]
Early x86 processors could be extended with floating-point hardware in the form of a series of floating-point numerical co-processors with names like 8087, 80287 and 80387, abbreviated x87. This was also known as the NPX (Numeric Processor eXtension), an apt name since the coprocessors, while used mainly for floating-point calculations, also performed integer operations on both binary and decimal formats. With very few exceptions, the 80486 and subsequent x86 processors then integrated this x87 functionality on chip which made the x87 instructions a de facto integral part of the x86 instruction set.
Each x87 register, known as ST(0) through ST(7), is 80 bits wide and stores numbers in the IEEE floating-point standard double extended precision format. These registers are organized as a stack with ST(0) as the top. This was done in order to conserve opcode space, and the registers are therefore randomly accessible only for either operand in a register-to-register instruction; ST0 must always be one of the two operands, either the source or the destination, regardless of whether the other operand is ST(x) or a memory operand. However, random access to the stack registers can be obtained through an instruction which exchanges any specified ST(x) with ST(0).
The operations include arithmetic and transcendental functions, including trigonometric and exponential functions, and instructions that load common constants (such as 0; 1; e, the base of the natural logarithm; log2(10); and log10(2)) into one of the stack registers. While the integer ability is often overlooked, the x87 can operate on larger integers with a single instruction than the 8086, 80286, 80386, or any x86 CPU without to 64-bit extensions can, and repeated integer calculations even on small values (e.g., 16-bit) can be accelerated by executing integer instructions on the x86 CPU and the x87 in parallel. (The x86 CPU keeps running while the x87 coprocessor calculates, and the x87 sets a signal to the x86 when it is finished or interrupts the x86 if it needs attention because of an error.)
PAE
[edit]The Physical Address Extension (PAE) was first added in the Intel Pentium Pro, and later by AMD in the Athlon processors,[41] to allow up to 64 GB of RAM to be addressed. Without PAE, physical RAM in 32-bit protected mode is usually limited to 4 GB. PAE defines a different page table structure with wider page table entries and a third level of page table, allowing additional bits of physical address. Although the initial implementations on 32-bit processors theoretically supported up to 64 GB of RAM, chipset and other platform limitations often restricted what could actually be used. x86-64 processors define page table structures that theoretically allow up to 52 bits of physical address, although again, chipset and other platform concerns (like the number of DIMM slots available, and the maximum RAM possible per DIMM) prevent such a large physical address space to be realized. On x86-64 processors PAE mode must be active before the switch to long mode, and must remain active while long mode is active, so while in long mode there is no "non-PAE" mode. PAE mode does not affect the width of linear or virtual addresses.
MMX
[edit]MMX is a SIMD instruction set designed by Intel and introduced in 1997 for the Pentium MMX microprocessor.[42] The MMX instruction set was developed from a similar concept first used on the Intel i860. It is supported on most subsequent IA-32 processors by Intel and other vendors. MMX is typically used for video processing (in multimedia applications, for instance).[43]
MMX added 8 new registers to the architecture, known as MM0 through MM7 (henceforth referred to as MMn). In reality, these new registers were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating-point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed, not relative, and therefore they were randomly accessible. The instruction set did not adopt the stack-like semantics so that existing operating systems could still correctly save and restore the register state when multitasking without modifications.[42]
Each of the MMn registers are 64-bit integers. However, one of the main concepts of the MMX instruction set is the concept of packed data types, which means instead of using the whole register for a single 64-bit integer (quadword), one may use it to contain two 32-bit integers (doubleword), four 16-bit integers (word) or eight 8-bit integers (byte). Given that the MMX's 64-bit MMn registers are aliased to the FPU stack and each of the floating-point registers are 80 bits wide, the upper 16 bits of the floating-point registers are unused in MMX. These bits are set to all ones by any MMX instruction, which correspond to the floating-point representation of NaNs or infinities.[42]
3DNow!
[edit]In 1997, AMD introduced 3DNow!.[44] The introduction of this technology coincided with the rise of 3D entertainment applications and was designed to improve the CPU's vector processing performance of graphic-intensive applications. 3D video game developers and 3D graphics hardware vendors use 3DNow! to enhance their performance on AMD's K6 and Athlon series of processors.[45]
3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses exactly the same register naming convention as MMX, that is MM0 through MM7.[46] The only difference is that instead of packing integers into these registers, two single-precision floating-point numbers are packed into each register. The advantage of aliasing the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about them.[47]
SSE
[edit]In 1999, Intel introduced the Streaming SIMD Extensions (SSE) instruction set, following in 2000 with SSE2. The first addition allowed offloading of basic floating-point operations from the x87 stack and the second made MMX almost obsolete and allowed the instructions to be realistically targeted by conventional compilers. Introduced in 2004 along with the Prescott revision of the Pentium 4 processor, SSE3 added specific memory and thread-handling instructions to boost the performance of Intel's HyperThreading technology. AMD licensed the SSE3 instruction set and implemented most of the SSE3 instructions for its revision E and later Athlon 64 processors. The Athlon 64 does not support HyperThreading and lacks those SSE3 instructions used only for HyperThreading.[48]
SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (In AMD64, the number of SSE XMM registers has been increased from 8 to 16.) However, the downside was that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected mode, called Enhanced mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into traditional Protected mode.
SSE is a SIMD instruction set that works only on floating-point values, like 3DNow!. However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of single precision floats into its registers. The original SSE was limited to only single-precision numbers, like 3DNow!. The SSE2 introduced the capability to pack double precision numbers too, which 3DNow! had no possibility of doing since a double precision number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128 bits, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow!, which were limited to only single precision. SSE3 does not introduce any additional registers.[48]
x86-64
[edit]
By the 2000s, 32-bit x86 processors' limits in memory addressing were an obstacle to their use in high-performance computing clusters and powerful desktop workstations. The aged 32-bit x86 was competing with much more advanced 64-bit RISC architectures which could address much more memory. Intel and the whole x86 ecosystem needed 64-bit memory addressing if x86 was to survive the 64-bit computing era, as workstation and desktop software applications were soon to start hitting the limits of 32-bit memory addressing. However, Intel felt that it was the right time to make a bold step and use the transition to 64-bit desktop computers for a transition away from the x86 architecture in general, an experiment which ultimately failed.
In 2001, Intel attempted to introduce a non-x86 64-bit architecture named IA-64 in its Itanium processor, initially aiming for the high-performance computing market, hoping that it would eventually replace the 32-bit x86.[49] While IA-64 was incompatible with x86, the Itanium processor did provide emulation abilities for translating x86 instructions into IA-64, but this affected the performance of x86 programs so badly that it was rarely, if ever, actually useful to the users: programmers should rewrite x86 programs for the IA-64 architecture or their performance on Itanium would be orders of magnitude worse than on a true x86 processor. The market rejected the Itanium processor since it broke backward compatibility and preferred to continue using x86 chips, and very few programs were rewritten for IA-64.
AMD decided to take another path toward 64-bit memory addressing, making sure backward compatibility would not suffer. In April 2003, AMD released the first x86 processor with 64-bit general-purpose registers, the Opteron, capable of addressing much more than 4 GB of virtual memory using the new x86-64 extension (also known as AMD64 or x64). The 64-bit extensions to the x86 architecture were enabled only in the newly introduced long mode, therefore 32-bit and 16-bit applications and operating systems could simply continue using an AMD64 processor in protected or other modes, without even the slightest sacrifice of performance[50] and with full compatibility back to the original instructions of the 16-bit Intel 8086.[51]: 13–14 The market responded positively, adopting the 64-bit AMD processors for both high-performance applications and business or home computers.
Seeing the market rejecting the incompatible Itanium processor and Microsoft supporting AMD64, Intel had to respond and introduced its own x86-64 processor, the Prescott Pentium 4, in July 2004.[52] As a result, the Itanium processor with its IA-64 instruction set is rarely used and x86, through its x86-64 incarnation, is still the dominant CPU architecture in non-embedded computers.
x86-64 also introduced the NX bit, which offers some protection against security bugs caused by buffer overruns.
As a result of AMD's 64-bit contribution to the x86 lineage and its subsequent acceptance by Intel, the 64-bit RISC architectures ceased to be a threat to the x86 ecosystem and almost disappeared from the workstation market. x86-64 began to be utilized in powerful supercomputers (in its AMD Opteron and Intel Xeon incarnations), a market which was previously the natural habitat for 64-bit RISC designs (such as the IBM Power microprocessors or SPARC processors). The great leap toward 64-bit computing and the maintenance of backward compatibility with 32-bit and 16-bit software enabled the x86 architecture to become an extremely flexible platform today, with x86 chips being utilized from small low-power systems (for example, Intel Quark and Intel Atom) to fast gaming desktop computers (for example, Intel Core i7 and AMD FX/Ryzen), and even dominate large supercomputing clusters, effectively leaving only the ARM 32-bit and 64-bit RISC architecture as a competitor in the smartphone and tablet market.
AMD-V and VT-x
[edit]Prior to 2005, x86 architecture processors were unable to meet the Popek and Goldberg virtualization requirements – a set of conditions for efficient virtualization created in 1974 by Gerald J. Popek and Robert P. Goldberg. However, both proprietary and open-source x86 virtualization hypervisor products were developed using software-based virtualization. Proprietary systems include Hyper-V, Parallels Workstation, VMware ESX, VMware Workstation, VMware Workstation Player and Windows Virtual PC, while free and open-source systems include QEMU, Kernel-based Virtual Machine, VirtualBox, and Xen.
The introduction of the AMD-V and Intel VT-x instruction sets in 2005 allowed x86 processors to meet the Popek and Goldberg virtualization requirements.[53]
AES-NI
[edit]The Advanced Encryption Standard New Instructions (AES-NI) instruction set extension is designed to accelerate AES encryption and decryption operations. It was first proposed by Intel in 2008.
AVX
[edit]The Advanced Vector Extensions (AVX) doubled the size of SSE registers to 256-bit YMM registers. It also introduced the VEX coding scheme to accommodate the larger registers, plus a few instructions to permute elements. AVX2 did not introduce extra registers, but was notable for the addition for masking, gather, and shuffle instructions.
AVX-512 features yet another expansion to 32 512-bit ZMM registers and a new EVEX scheme. Unlike its predecessors featuring a monolithic extension, it is divided into many subsets that specific models of CPUs can choose to implement.
APX
[edit]The Advanced Performance Extensions (APX) are extensions to double the number of general-purpose registers from 16 to 32 and add new features to improve general-purpose performance.[54][55][56][57] These extensions have been called "generational"[58] and "the biggest x86 addition since 64 bits".[59] Intel contributed APX support to GNU Compiler Collection (GCC) 14.[60]
According to the architecture specification,[61] the main features of APX are:
- 16 additional general-purpose registers R16-R31, called the Extended GPRs (EGPRs)
- Three-operand instruction formats for many integer instructions
- New conditional instructions for loads, stores, and comparisons with common instructions that do not modify flags
- Optimized register save/restore operations
- A 64-bit absolute direct jump instruction
Extended GPRs for general purpose instructions are encoded using a 2-byte REX2 prefix, while new instructions and extended operands for existing AVX/AVX2/AVX-512 instructions are encoded with an extended EVEX prefix which has four variants used for different groups of instructions.
See also
[edit]- Comparison of instruction set architectures
- x86 calling conventions
- x86 instruction listings
- CPUID
- 680x0, a competing architecture in the 16-bit and early 32-bit eras
- PowerPC, a competing architecture in the later 32-bit and 64-bit eras
- List of AMD processors
- List of Intel processors
- List of Intel CPU microarchitectures
- List of VIA microprocessor cores
- List of x86 manufacturers
- Interrupt request
- Speculative execution CPU vulnerabilities
- Tick–tock model
- Virtual legacy wires
Notes
[edit]- ^ Unlike the microarchitecture (and specific electronic and physical implementation) used for a specific microprocessor design.
- ^ The GRID Compass laptop, for instance.
- ^ Including the 8088, 80186, 80188 and 80286 processors.
- ^ Such a system also contained the usual mix of standard 7400 series support components, including multiplexers, buffers, and glue logic.
- ^ The actual meaning of iAPX was Intel Advanced Performance Architecture, or sometimes Intel Advanced Processor Architecture.
- ^ late 1981 to early 1984, approximately
- ^ The embedded processor market is populated by more than 25 different architectures, which, due to the price sensitivity, low power, and hardware simplicity requirements, outnumber the x86.
- ^ The NEC V20 and V30 also provided the older 8080 instruction set, allowing PCs equipped with these microprocessors to operate CP/M applications at full speed (i.e., without the need to simulate an 8080 by software).
- ^ Fabless companies designed the chip and contracted another company to manufacture it, while fabbed companies would do both the design and the manufacturing themselves. Some companies started as fabbed manufacturers and later became fabless designers, one such example being AMD.
- ^ It had a slower FPU however, which is slightly ironic as Cyrix started out as a designer of fast floating-point units for x86 processors.
- ^ Intel abandoned its "x86" naming scheme with the P5 Pentium during 1993 (as numbers could not be trademarked). However, the term x86 was already established among technicians, compiler writers etc.
- ^ 16-bit and 32-bit microprocessors were introduced during 1978 and 1985 respectively; plans for 64-bit was announced during 1999 and gradually introduced from 2003 and onwards.
- ^ Some "CISC" designs, such as the PDP-11, may use two.
- ^ That is because integer arithmetic generates carry between subsequent bits (unlike simple bitwise operations).
- ^ Two MSRs of particular interest are SYSENTER_EIP_MSR and SYSENTER_ESP_MSR, introduced on the Pentium® II processor, which store the address of the kernel mode system service handler and corresponding kernel stack pointer. Initialized during system startup, SYSENTER_EIP_MSR and SYSENTER_ESP_MSR are used by the SYSENTER (Intel) or SYSCALL (AMD) instructions to achieve Fast System Calls, about three times faster than the software interrupt method used previously.
- ^ Because a segmented address is the sum of a 16-bit segment multiplied by 16 and a 16-bit offset, the maximum address is 1,114,095 (10FFEF hex), for an addressability of 1,114,096 bytes = 1 MB + 65,520 bytes. Before the 80286, x86 CPUs had only 20 physical address lines (address bit signals), so the 21st bit of the address, bit 20, was dropped and addresses past 1 MB were mirrors of the low end of the address space (starting from address zero). Since the 80286, all x86 CPUs have at least 24 physical address lines, and bit 20 of the computed address is brought out onto the address bus in real mode, allowing the CPU to address the full 1,114,096 bytes reachable with an x86 segmented address. On the popular IBM PC platform, switchable hardware to disable the 21st address bit was added to machines with an 80286 or later so that all programs designed for 8088/8086-based models could run, while newer software could take advantage of the "high" memory in real mode and the full 16 MB or larger address space in protected mode—see A20 gate.
- ^ An extra descriptor record at the top of the table is also required, because the table starts at zero but the minimum descriptor index that can be loaded into a segment register is 1; the value 0 is reserved to represent a segment register that points to no segment.
References
[edit]- ^ Rao, P.V.S. (2009). Computer System Architecture. Prentice-Hall of India. p. 402 (Section 19.1, The x86 family of processors). ISBN 978-81-203-3594-3.
- ^ Mhatre, Swapneel Chandrakant (2012). Microprocessors and Interfacing Techniques: For S. E. (Computer Engineering) Semester II of University of Pune. Jaico Publishing House. ISBN 978-81-8495-325-1.
- ^ Alcorn, Paul (February 9, 2022). "AMD Sets All-Time CPU Market Share Record as Intel Gains in Desktop and Notebook PCs". Tom's Hardware.
- ^ Brandon, Jonathan (April 15, 2015). "The cloud beyond x86: How old architectures are making a comeback". ICloud PE. Business Cloud News. Archived from the original on August 19, 2021. Retrieved November 23, 2020.
Despite the dominance of x86 in the datacentre it is difficult to ignore the noise vendors have been making over the past couple of years around non-x86 architectures like ARM...
- ^ Dvorak, John C. "Whatever Happened to the Intel iAPX432?". Dvorak.org. Archived from the original on November 25, 2017. Retrieved April 18, 2014.
- ^ iAPX 286 Programmer's Reference (PDF). Intel. 1983. Archived (PDF) from the original on August 28, 2017. Retrieved August 28, 2017.
- ^ a b iAPX 86, 88 User's Manual (PDF). Intel. August 1981. Archived (PDF) from the original on August 28, 2017. Retrieved August 28, 2017.
- ^ Edwards, Benj (June 16, 2008). "Birth of a Standard: The Intel 8086 Microprocessor". PCWorld. Archived from the original on September 26, 2010. Retrieved September 14, 2014.
- ^ Stanley Mazor (January–March 2010). "Intel's 8086". IEEE Annals of the History of Computing. 32 (1): 75–79. doi:10.1109/MAHC.2010.22. S2CID 16451604.
- ^ "AMD Discloses New Technologies At Microprocessor Forum" (Press release). AMD. October 5, 1999. Archived from the original on March 2, 2000.
"Time and again, processor architects have looked at the inelegant x86 architecture and declared it cannot be stretched to accommodate the latest innovations," said Nathan Brookwood, principal analyst, Insight 64.
- ^ Burt, Jeff (April 5, 2010). "Microsoft to End Intel Itanium Support". eWeek. Retrieved June 2, 2022.
- ^ Pryce, Dave (May 11, 1989). "80486 32-bit CPU breaks new ground in chip density and operating performance. (Intel Corp.) (product announcement) EDN" (Press release).
- ^ Swoyer, Stephen (April 24, 2003). "AMD introduces 64-bit Opteron Chip (ESJ) (news article)".
- ^ a b "Intel 64 and IA-32 Architectures Optimization Reference Manual" (PDF). Intel. September 2019. 3.4.2.2 Optimizing for Macro-fusion. Archived (PDF) from the original on February 14, 2020. Retrieved March 7, 2020.
- ^ a b Fog, Agner. "The microarchitecture of Intel, AMD and VIA CPUs" (PDF). p. 107. Archived (PDF) from the original on March 22, 2019. Retrieved March 7, 2020.
Core2 can do macro-op fusion only in 16-bit and 32-bit mode. Core Nehalem can also do this in 64-bit mode.
- ^ "Zet: The x86 (IA-32) open implementation: Overview". OpenCores. November 4, 2013. Archived from the original on February 11, 2018. Retrieved January 5, 2014.
- ^ "Zhaoxin Preparing Linux Kernel Support For 7-Series Centaur CPUs". www.phoronix.com. Retrieved April 5, 2022.
- ^ "Zhaoxin aiming at 2021 release for its 7nm x86 CPUs - CPU - News - HEXUS.net". m.hexus.net. Retrieved April 5, 2022.
- ^ "Zhaoxin Finally Adding "Lujiazui" x86_64 CPU Tuning To GCC". www.phoronix.com. Retrieved April 5, 2022.
- ^ "Setup and installation considerations for Windows x64 Edition-based computers". Archived from the original on September 11, 2014. Retrieved September 14, 2014.
- ^ Paul Alcorn (December 19, 2024). "Intel terminates x86S initiative — unilateral quest to de-bloat x86 instruction set comes to an end". Tom's Hardware. Retrieved December 20, 2024.
- ^ "Envisioning a Simplified Intel Architecture". Intel.
- ^ Larabel, Michael (May 20, 2023). "Intel Publishes "X86-S" Specification For 64-bit Only Architecture". Phoronix. Retrieved May 20, 2023.
- ^ "Envisioning a Simplified Intel Architecture for the Future". Intel.
- ^ "Processors — What mode of addressing do the Intel Processors use?". Archived from the original on September 11, 2014. Retrieved September 14, 2014.
- ^ "DSB Switches". Intel VTune Amplifier 2013. Intel. Archived from the original on December 2, 2013. Retrieved August 26, 2013.
- ^ "The 8086 Family User's Manual" (PDF). Intel Corporation. October 1979. p. 2-68. Archived (PDF) from the original on April 4, 2018. Retrieved March 28, 2018.
- ^ "iAPX 286 Programmer's Reference Manual" (PDF). Intel Corporation. 1983. 2.4.3 Memory Addressing Modes. Archived (PDF) from the original on August 28, 2017. Retrieved August 28, 2017.
- ^ 80386 Programmer's Reference Manual (PDF). Intel Corporation. 1986. 2.5.3.2 EFFECTIVE-ADDRESS COMPUTATION. Archived (PDF) from the original on December 28, 2018. Retrieved March 28, 2018.
- ^ a b Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture. Intel Corporation. March 2018. Chapter 3. Archived from the original on January 26, 2012. Retrieved March 19, 2014.
- ^ Andriesse, Dennis (2019). "6.5 Effects of Compiler Settings on Disassembly". Practical binary analysis: build your own Linux tools for binary instrumentation, analysis, and disassembly. San Francisco, CA: No Starch Press, Inc. ISBN 978-1-59327-913-4. OCLC 1050453850.
- ^ "Guide to x86 Assembly". Cs.virginia.edu. September 11, 2013. Archived from the original on March 24, 2020. Retrieved February 6, 2014.
- ^ "FSTSW/FNSTSW — Store x87 FPU Status Word". Archived from the original on January 25, 2022. Retrieved January 15, 2020.
The FNSTSW AX form of the instruction is used primarily in conditional branching...
- ^ Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture (PDF). Intel. March 2013. Chapter 8. Archived (PDF) from the original on April 2, 2013. Retrieved April 23, 2013.
- ^ "Intel 80287 family". CPU-world. Archived from the original on August 9, 2016. Retrieved July 21, 2016.
- ^ Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture (PDF). Intel. March 2013. Chapter 9. Archived (PDF) from the original on April 2, 2013. Retrieved April 23, 2013.
- ^ Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture (PDF). Intel. March 2013. Chapter 10. Archived (PDF) from the original on April 2, 2013. Retrieved April 23, 2013.
- ^ iAPX 286 Programmer's Reference (PDF). Intel. 1983. Section 1.2, "Modes of Operation". Archived (PDF) from the original on August 28, 2017. Retrieved January 27, 2014.
- ^ iAPX 286 Programmer's Reference (PDF). Intel. 1983. Chapter 6, "Memory Management and Virtual Addressing". Archived (PDF) from the original on August 28, 2017. Retrieved January 27, 2014.
- ^ "Intel's Yamhill Technology: x86-64 compatible |Geek.com". Archived from the original on September 5, 2012. Retrieved July 18, 2008.
- ^ AMD, Inc. (February 2002). "Appendix E" (PDF). AMD Athlon™ Processor x86 Code Optimization Guide (Revision K ed.). p. 250. Archived (PDF) from the original on April 13, 2017. Retrieved April 13, 2017.
A 2-bit index consisting of PCD and PWT bits of the page table entry is used to select one of four PAT register fields when PAE (page address extensions) is enabled, or when the PDE doesn't describe a large page.
- ^ a b c "Programming With the Intel MMX™ Technology". Embedded Pentium® Processor Family Technical Information Center. Intel. Archived from the original on July 25, 2003. Retrieved June 5, 2022.
- ^ Krishnaprasad, S. (January 1, 2004). "SIMD programming illustrated using Intel's MMX instruction set". Journal of Computing Sciences in Colleges. 19 (3): 268–277. ISSN 1937-4771.
- ^ Sexton, Michael Justin Allen (April 21, 2017). "The History Of AMD CPUs". Tom's Hardware. Retrieved June 5, 2022.
- ^ Shimpi, Anand Lal (October 29, 1998). "AMD's K6-2 350: Something to do..." AnandTech. Archived from the original on July 14, 2012. Retrieved June 5, 2022.
- ^ "Intel's MMX and AMD's 3DNow! SIMD Operations". web.mit.edu. Archived from the original on July 27, 2022. Retrieved June 5, 2022.
- ^ "3DNow!™ Technology Manual" (PDF). Advanced Micro Devices. Retrieved June 5, 2022.
- ^ a b "Upgrading And Repairing PCs 21st Edition: Processor Features". Tom's Hardware. October 31, 2013. Retrieved June 5, 2022.
- ^ Manek Dubash (July 20, 2006). "Will Intel abandon the Itanium?". Techworld. Archived from the original on February 19, 2011. Retrieved December 19, 2010.
Once touted by Intel as a replacement for the x86 product line, expectations for Itanium have been throttled well back.
- ^ "IBM WebSphere Application Server 64-bit Performance Demystified" (PDF). IBM Corporation. September 6, 2007. p. 14. Archived (PDF) from the original on January 25, 2022. Retrieved April 9, 2010.
Figures 5, 6 and 7 also show the 32-bit version of WAS runs applications at full native hardware performance on the POWER and x86-64 platforms. Unlike some 64-bit processor architectures, the POWER and x86-64 hardware does not emulate 32-bit mode. Therefore applications that do not benefit from 64-bit features can run with full performance on the 32-bit version of WebSphere running on the above mentioned 64-bit platforms.
- ^ "Volume 2: System Programming" (PDF). AMD64 Architecture Programmer's Manual. AMD Corporation. March 2024. Archived (PDF) from the original on April 4, 2024. Retrieved April 24, 2024.
- ^ Charlie Demerjian (September 26, 2003). "Why Intel's Prescott will use AMD64 extensions". The Inquirer. Archived from the original on October 10, 2009. Retrieved October 7, 2009.
- ^ Adams, Keith; Agesen, Ole (October 21–25, 2006). A Comparison of Software and Hardware Techniques for x86 Virtualization (PDF). Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006. ACM 1-59593-451-0/06/0010. Archived (PDF) from the original on August 20, 2010. Retrieved December 22, 2006.
- ^ Winkel, Sebastian; Agron, Jason. "Advanced Performance Extensions (APX)". Intel. Retrieved October 22, 2023.
- ^ Robinson, Dan. "Intel adds fresh x86 and vector instructions for future chips". The Register. Retrieved October 22, 2023.
- ^ Bonshor, Gavin. "Intel Unveils AVX10 and APX Instruction Sets: Unifying AVX-512 For Hybrid Architectures". AnandTech. Archived from the original on September 28, 2023. Retrieved October 22, 2023.
- ^ Alcorn, Paul (July 24, 2023). "Intel's New AVX10 Brings AVX-512 Capabilities to E-Cores". Tom's Hardware. Retrieved October 22, 2023.
- ^ Shah, Agam (August 9, 2023). "Intel's Generational On-Chip Change APX Will Make All the Apps Faster". The New Stack. Retrieved October 22, 2023.
- ^ Byrne, Joseph. "APX is Biggest x86 Addition Since 64 Bits". Tech Insights.
- ^ Larabel, Michael. "Intel APX Code Begins Landing Within The GCC Compiler". Phoronix. Retrieved October 22, 2023.
- ^ "Intel® Advanced Performance Extensions (Intel® APX) Architecture Specification". Intel. July 21, 2023. Retrieved October 22, 2023.
Further reading
[edit]- Rosenblum, Mendel; Garfinkel, Tal (May 2005). "Virtual machine monitors: current technology and future trends". IEEE Computer. 38 (5): 39–47. CiteSeerX 10.1.1.614.9870. doi:10.1109/MC.2005.176. S2CID 10385623.
External links
[edit]- Why Intel can't seem to retire the x86
- 32/64-bit x86 Instruction Reference
- Intel Intrinsics Guide, an interactive reference tool for Intel intrinsic instructions
- Intel® 64 and IA-32 Architectures Software Developer's Manuals
- AMD Developer Guides, Manuals & ISA Documents, AMD64 Architecture
Overview
Definition and Core Characteristics
The x86 instruction set architecture (ISA) is a complex instruction set computing (CISC) design that originated with Intel's 8086 microprocessor introduced in 1978.[5][6] As a CISC architecture, x86 emphasizes a rich set of instructions capable of performing complex operations in a single command, contrasting with reduced instruction set computing (RISC) approaches that favor simpler, fixed-length instructions.[5] This foundational ISA has powered generations of processors, forming the basis for both 32-bit IA-32 and 64-bit Intel 64 extensions while maintaining core principles of the original design.[5] Key characteristics of the x86 ISA include variable-length instructions ranging from 1 to 15 bytes, allowing for compact encoding of simple operations while accommodating more complex ones with operands and modifiers.[5] It employs little-endian byte order, where multi-byte data is stored with the least significant byte at the lowest memory address, facilitating efficient processing of varying data sizes.[5] Early implementations, such as the 8086, utilized a segmented memory model to address up to 1 MB of memory through segment registers and offsets, though later modes shifted toward flat addressing.[5] A hallmark trait is its commitment to backward compatibility, ensuring that software written for prior generations executes on newer processors without modification.[5] The basic execution model follows a fetch-decode-execute cycle, where the processor retrieves instructions from memory, decodes them into executable actions, and performs the operations.[5] In modern superscalar implementations, this process incorporates pipelining to overlap stages across multiple instructions, enhancing throughput by processing several instructions concurrently.[5] Complex x86 instructions are typically broken down into simpler micro-operations (μops) during decoding, which are then scheduled and executed out-of-order for improved performance while preserving the semantic behavior of the original instruction.[5] Central to x86's design are concepts like relative orthogonality in register usage, where general-purpose registers can generally serve as sources or destinations for most instructions without restrictions tied to specific operations.[5] The flags register (EFLAGS in 32-bit mode or RFLAGS in 64-bit mode) plays a crucial role by capturing condition codes—such as zero, carry, sign, and overflow flags—generated during arithmetic and logical operations, enabling conditional branching and control flow decisions.[5]Historical and Modern Significance
The x86 architecture has maintained a dominant position in the computing landscape, powering over 85% of personal computers and servers worldwide as of early 2025, with Intel and AMD collectively holding the vast majority of the market in these segments.[7] This prevalence stems from its early adoption in personal computing and server environments, where it underpins major operating systems including Windows and Linux distributions, which continue to rely heavily on x86 for desktop, laptop, and enterprise deployments.[8] Prior to Apple's transition to ARM-based processors with the M1 chip in 2020, macOS also ran exclusively on x86 hardware, further solidifying its role in consumer ecosystems during the architecture's peak expansion.[9] A key factor in x86's enduring influence is its commitment to backward compatibility, which enables modern processors to execute software binaries developed decades earlier without modification, preserving vast libraries of legacy applications and reducing the costs of software migration.[9] This feature has profoundly shaped operating system development, as seen in the evolution from MS-DOS to Windows NT kernels, where x86's instruction set provided a stable foundation for layering advanced features like multitasking and graphical interfaces.[10] Similarly, Linux's initial design and widespread adoption were tailored to x86 hardware, fostering an open-source ecosystem that leverages this compatibility for server and embedded applications.[11] Economically, x86 drives substantial revenue for its primary manufacturers, with Intel's Client Computing Group—encompassing x86 processors for PCs—generating $30.3 billion in 2024, while AMD's Client and Data Center segments, heavily reliant on x86 designs like Ryzen and EPYC, contributed $7.1 billion and $12.6 billion respectively in the same year.[12] This financial scale extends to broader industry effects, influencing standards for peripherals and interconnects such as PCI Express, which originated from x86-centric designs and remain integral to PC and server hardware compatibility.[8] Despite its strengths, x86 faces ongoing debates regarding power efficiency compared to RISC alternatives like ARM, particularly in mobile and data center applications where ARM's simpler instruction set can yield better performance per watt under low-power constraints.[13] However, proponents including AMD argue that x86 has closed much of this gap through architectural optimizations and hybrid designs, ensuring its continued relevance in high-performance computing even as ARM gains traction in niche areas.[14]History
Origins and Early Development
The origins of the x86 architecture trace back to Intel's early microprocessor efforts in the 1970s, which laid the groundwork for subsequent designs. The Intel 8008, introduced in 1972, was an 8-bit processor developed for CRT terminal applications, featuring a 16 KB memory addressing capability and 66 instructions.[15] This was followed by the 8080 in 1974, an enhanced 8-bit microprocessor that extended the 8008's architecture with support for 64 KB of memory, 111 instructions, and limited 16-bit data handling facilities.[15] These processors established Intel's foundation in microprocessor design, influencing the transition toward more capable systems. The 8086, released in 1978, marked the inception of the x86 family under the leadership of architect Stephen Morse, with refinements by Bruce Ravenel.[15] As a 16-bit processor, it featured a 20-bit external address bus enabling access to 1 MB of memory and introduced a segmented memory model using 64 KB segments defined by segment registers and offsets.[15] This design supported 133 instructions, including 8-bit and 16-bit signed/unsigned arithmetic operations, along with 9 status flags, while maintaining assembly-level compatibility with the 8080 to facilitate software migration.[15] Key early processors built on the 8086 foundation included the 8088, introduced in 1979 and selected as the central processing unit for the IBM PC in 1981; it mirrored the 8086 internally but used an 8-bit external data bus for compatibility with lower-cost peripherals.[16] The 80186, released in 1982, enhanced the architecture with integrated peripherals such as timers, a DMA controller, and an interrupt controller, while retaining the 16-bit data bus and 1 MB memory limit in real mode.[17] The 80286, launched in 1982, advanced x86 capabilities by introducing protected mode, which supported multitasking, memory protection via segment descriptors, a 24-bit address bus for 16 MB of physical memory, and up to 1 GB of virtual memory.[17] The design philosophy of the 8086 and its successors emphasized a Complex Instruction Set Computing (CISC) approach to optimize for high-level language compilation, drawing influences from IBM's System/360 for its instruction richness and the PDP-11 for register-based addressing and orthogonality.[15] This focus aimed to achieve approximately 10 times the throughput of the 8080 while supporting larger memory spaces and efficient code density.[15]Evolution from 16-bit to 64-bit Eras
The transition to 32-bit architectures marked a significant evolution in the x86 family, beginning with the Intel 80386 microprocessor introduced in 1985, which implemented protected mode featuring a flat 32-bit memory model, hardware paging, and virtual memory capabilities to support multitasking operating systems.[18] This design allowed for up to 4 GB of addressable memory and simplified memory management compared to the segmented 16-bit real mode, enabling more efficient protection and sharing of memory regions among processes.[18] The 80386's paging unit translated virtual addresses to physical ones using page tables, facilitating demand-paged virtual memory systems that became foundational for modern operating systems like Windows NT and Linux.[18] Building on this foundation, the Intel 80486, released in 1989, integrated the floating-point unit (FPU) directly on-chip, eliminating the need for a separate coprocessor and improving performance for numerical computations by reducing latency in floating-point operations.[19] The 80486 also added an 8 KB on-chip cache and pipelined execution, enhancing overall instruction throughput while maintaining full binary compatibility with 80386 software.[19] The Pentium series, starting with the original Pentium processor in 1993, introduced superscalar execution with two parallel pipelines, allowing simultaneous processing of integer instructions and incorporating dynamic branch prediction to mitigate pipeline stalls from conditional jumps.[20] This shift improved instruction-level parallelism, delivering roughly double the performance of the 80486 at similar clock speeds.[20] Subsequent advancements culminated in the Pentium Pro in 1995, which adopted out-of-order execution, dynamic data flow analysis, and advanced branch prediction, enabling the processor to reorder instructions speculatively for better utilization of execution resources while preserving x86 compatibility.[21] The move to 64-bit computing addressed the limitations of 32-bit addressing, with AMD pioneering the extension through its AMD64 architecture announced in 1999 and first implemented in the Opteron processor launched in April 2003, which doubled the number of general-purpose registers to 16 (each 64 bits wide) and supported 64-bit virtual addressing limited to 48 bits (248 bytes, or 256 terabytes) in initial implementations.[22][23] Intel responded with its EM64T (Extended Memory 64 Technology), integrated into Pentium 4 processors starting in 2004, adopting a compatible extension that similarly expanded registers and addressing while ensuring seamless operation with existing 32-bit software ecosystems.[24][22] A key challenge in this 64-bit transition was preserving backward compatibility with vast legacy codebases, addressed through operating modes that allowed processors to emulate prior environments. Legacy mode replicated 32-bit and 16-bit protected and real modes, enabling unmodified x86 software to run without recompilation, while long mode provided native 64-bit execution with a compatibility submode for running 32-bit applications under a 64-bit operating system.[25][26] These mechanisms, including segmented addressing in compatibility mode and flat 64-bit addressing in native mode, minimized disruption but introduced complexity in mode switching and pointer size handling, requiring careful operating system design to manage transitions efficiently.[25]Key Designers and Manufacturers
The x86 architecture originated at Intel, where it has remained the dominant force in design and manufacturing since the introduction of the 8086 microprocessor in 1978, evolving through generations to the modern Core series processors that power the majority of personal computers, servers, and data centers worldwide.[1] Key early contributors at Intel included Marcian "Ted" Hoff, who as the company's twelfth employee pioneered the concept of the microprocessor with the 4004 in 1971, laying the foundational principles of integrated processing that influenced the x86 lineage.[27][28] More recently, under the leadership of CEO Pat Gelsinger—who joined Intel in 1979 and contributed to the design of early x86 processors like the 80286 and 80386—the company has overseen advancements in x86 efficiency, manufacturing processes, and ecosystem integration.[29][30] Advanced Micro Devices (AMD) emerged as Intel's primary rival through licensed second-sourcing and innovative extensions to the x86 architecture. In 1982, AMD produced the Am8086, a licensed clone of Intel's 8086 that enabled broader market adoption by providing alternative supply during high demand.[31] AMD's most transformative contribution came in 2000 with the design of x86-64, a backward-compatible 64-bit extension to the x86 instruction set that addressed limitations in the original architecture and became the industry standard after its debut in the Opteron processor in 2003.[32][33] By 2017, AMD's Zen microarchitecture marked a competitive resurgence, delivering high-performance x86 cores that challenged Intel's market leadership in consumer and server segments through improved efficiency and multi-threading capabilities.[34][35] Other manufacturers played niche roles in x86 development, often through licensing or acquisitions amid legal battles over intellectual property. Cyrix, founded in 1988, initially specialized in high-speed floating-point units compatible with x86 systems before producing full processors like the 6x86 in the 1990s, though it faced repeated lawsuits from Intel for patent infringements and was acquired by National Semiconductor in 1997 and later by VIA Technologies in 1999.[36][37] NexGen, known for its innovative Nx586 processor in 1994, was acquired by AMD in 1995, integrating its designs into AMD's early x86 offerings.[38] VIA Technologies, after acquiring Cyrix's assets, continued low-power x86 production, particularly for embedded systems.[39] These dynamics were shaped by ongoing licensing disputes, culminating in a landmark 2009 settlement where Intel paid AMD $1.25 billion and granted a six-year cross-licensing agreement for x86 patents, resolving antitrust claims and enabling mutual innovation.[40][41][42] As of 2025, x86 market leadership reflects collaborative efforts to ensure longevity amid competition from ARM-based architectures. In October 2024, Intel and AMD formed the x86 Ecosystem Advisory Group with partners including Broadcom, Cisco, Dell Technologies, Google Cloud, HPE, Lenovo, Meta, Microsoft, and Oracle to standardize instruction sets and promote interoperability, fostering developer innovation and architectural consistency.[43][44] Additionally, Intel has pursued partnerships beyond traditional x86 rivals, such as a September 2025 collaboration with NVIDIA involving a $5 billion investment to co-develop AI-focused system-on-chips integrating x86 CPUs with NVIDIA GPUs, including RTX for personal computing, for data centers and personal computing.[45][46]Architectural Fundamentals
Registers and Data Types
The x86 architecture employs a set of general-purpose registers (GPRs) that serve as the primary storage for operands in arithmetic, logical, and data transfer operations. In the original 16-bit implementation of the Intel 8086 processor, there were eight 16-bit GPRs: AX (accumulator), BX (base), CX (counter), DX (data), SI (source index), DI (destination index), BP (base pointer), and SP (stack pointer). These registers could be accessed in their full 16-bit form or subdivided into 8-bit portions, such as AH and AL for the high and low bytes of AX.[47] With the transition to 32-bit processors in the Intel 80386, these registers were extended to 32 bits, prefixed with an 'E' (e.g., EAX, EBX), adding 16 upper bits to each while maintaining backward compatibility for 16-bit and 8-bit accesses.[47] The evolution to 64-bit mode, introduced in the AMD64 architecture and adopted by Intel as IA-32e or Intel 64, further extended the registers to 64 bits (e.g., RAX, RBX) and added eight new GPRs (R8 through R15), resulting in 16 GPRs total, each capable of holding 64-bit values while supporting sub-register accesses for smaller sizes.[47] This expansion enhances performance by reducing memory accesses and enabling larger address spaces, with the REX prefix used in 64-bit mode to access the extended registers and full 64-bit widths.[47] The following table illustrates the evolution and size variants of the GPRs across x86 modes:| Mode | Register Count | 8-bit Access Examples | 16-bit Access Examples | 32-bit Access Examples | 64-bit Access Examples |
|---|---|---|---|---|---|
| 16-bit | 8 | AL, AH, BL, BH, etc. | AX, BX, CX, DX, SI, DI, BP, SP | N/A | N/A |
| 32-bit | 8 | AL, AH, etc. (same) | AX, BX, etc. (same) | EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP | N/A |
| 64-bit | 16 | AL, AH, etc. (same); SPL, BPL, SIL, DIL, R8B-R15B | AX, BX, etc. (same); R8W-R15W | EAX, EBX, etc. (same); R8D-R15D | RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, R8-R15[47] |
Addressing Modes
x86 addressing modes provide mechanisms for computing effective memory addresses within instructions, allowing flexible access to data in memory or registers. These modes have evolved across the architecture's history, supporting various operand types from simple constants to complex combinations of registers and displacements. The primary modes include immediate, register, direct, register indirect, and based-indexed with scaling.[48] In the immediate mode, the operand is a constant value embedded directly in the instruction, such as an 8-bit, 16-bit, 32-bit, or 64-bit immediate value, used for operations like loading a fixed value into a register.[48] Register mode accesses data directly from one of the processor's general-purpose registers, such as EAX or RAX, without involving memory.[48] Direct mode specifies an absolute memory address using a displacement, typically 32 bits in 32-bit mode or 64 bits in 64-bit mode, to point to a fixed location.[48] Register indirect mode uses the contents of a register as the memory address, for example, dereferencing the value in EBX to access [EBX].[48] The based-indexed mode with scaling combines a base register, an index register scaled by a factor of 1, 2, 4, or 8, and an optional displacement to form the effective address, expressed as [base + index * scale + displacement], enabling efficient array access.[48] In 16-bit real mode, addressing relies on a segmented memory model where the effective address is calculated as segment << 4 + offset, allowing up to 20 bits of addressing for 1 MB of memory.[48] Segment overrides, specified by prefixes like 2EH for CS or 26H for ES, allow explicit selection of segment registers such as DS, CS, or SS for the base, overriding the default data segment.[48] This segment:offset scheme, with 16-bit segment and 16-bit offset values, supports legacy compatibility but limits the address space compared to later modes.[48] The transition to 32-bit protected mode expands addressing to a flat 32-bit linear model, where segmentation is optional and often ignored, providing direct access to 4 GB of memory without segment:offset calculations.[48] In x86-64 long mode, the flat model extends to 64 bits, using a linear address space up to 2^48 bytes in typical implementations, with segment registers like DS and ES treated as flat (offset 0).[48] A key addition in x86-64 is RIP-relative addressing, where the effective address is computed as RIP + displacement, using a 32-bit signed offset for position-independent code with a ±2 GB range.[48] Addressing modes in x86 have specific limitations to balance complexity and performance. Base modes, including register indirect and based-indexed, do not support automatic increment or decrement of registers after access, unlike some architectures, requiring separate instructions for such adjustments.[48] Displacement fields are restricted to 8-bit or 32-bit sizes in most cases, sign-extended as needed, which constrains the offset range but keeps instruction encoding compact.[48] These constraints reflect the CISC heritage, prioritizing a rich set of modes over simplicity.Instruction Set Characteristics
The x86 instruction set employs a variable-length encoding scheme, where individual instructions range from 1 to 15 bytes in total length, with primary opcodes typically consisting of 1 to 3 bytes to allow for a dense yet extensible format.[48] This structure includes optional legacy prefixes (up to four), the opcode itself, a ModR/M byte for operand specification, an optional Scale-Index-Base (SIB) byte, displacement fields, and immediate operands.[48] The ModR/M byte, an 8-bit field, breaks down into a 2-bit Mod field for addressing modes (such as register-to-register or memory with displacement), a 3-bit Reg/Opcode field for register selection or opcode extension, and a 3-bit R/M field for the register or memory operand.[48] In 64-bit mode, the REX prefix—a single byte ranging from 0x40 to 0x4F—extends this encoding by specifying 64-bit operand sizes via its W bit and accessing extended registers (R8–R15) through its R, X, and B bits.[48] x86 instructions are categorized into several functional groups that reflect their role in general-purpose computing. Data movement instructions, such as MOV for register-to-register or memory transfers and PUSH/POP for stack operations, facilitate efficient data handling between memory, registers, and the stack.[48] Arithmetic instructions include ADD and SUB for basic addition and subtraction, as well as MUL and IMUL for multiplication supporting both unsigned and signed integers, often producing results in specific registers like EAX/RAX.[48] Control flow instructions, exemplified by JMP for unconditional jumps, CALL and RET for subroutine management, and conditional branches like Jcc (e.g., JE for jump if equal), enable program sequencing and decision-making.[48] String operations, such as MOVS for block transfers, CMPS for comparisons, and LODS for loading strings, support repetitive memory operations with auto-increment/decrement based on the direction flag (DF).[48] As a hallmark of Complex Instruction Set Computing (CISC), the x86 architecture incorporates instructions that perform multiple operations in a single execution, reducing code density but increasing decoder complexity.[48] For instance, ENTER constructs stack frames by allocating space, saving the previous frame pointer, and linking to higher-level frames for nested procedures, while LEAVE reverses this by restoring the stack pointer and frame pointer.[48] Such fused operations, like those combining arithmetic with condition codes (e.g., ADC for add with carry), exemplify how x86 instructions can encapsulate sequences that might require multiple steps in simpler ISAs. The perceived inefficiency of x86 compared to ARM stems from its CISC roots involving complex instruction decoding and legacy features, but this is overstated; modern x86 implementations mitigate these issues by decoding complex instructions into simpler micro-operations (micro-ops), allowing execution pipelines to operate similarly to RISC architectures like ARM, with micro-op caches further reducing decode overhead.[49] Backward compatibility is maintained through prefix bytes that override default operand and address sizes, ensuring legacy code portability across modes. The 66h prefix toggles operand sizes, such as switching from 32-bit to 16-bit defaults in 64-bit mode for instructions like MOV.[48] Similarly, the 67h prefix adjusts address sizes, allowing 32-bit addressing in 64-bit environments or vice versa, which is crucial for mixed-mode applications without recompilation.[48]Operating Modes
The x86 architecture supports multiple operating modes that define the processor's execution environment, including memory addressing, protection mechanisms, and privilege levels. These modes enable backward compatibility while providing advanced features for modern operating systems. Real mode serves as the foundational state, emulating the original 8086 processor for simple, direct memory access.[50] In real mode, the processor operates with a 20-bit physical address space limited to 1 MB, using a segmented memory model where addresses are calculated as segment × 16 + offset. This mode lacks memory protection and privilege levels, allowing unrestricted access to the entire address space, which simplifies legacy software execution but poses risks in multitasking environments. The A20 line, a hardware signal, must be enabled to access the full 1 MB; when disabled, it masks the 21st address bit (A20) to mimic the 8086's 1 MB wraparound behavior, preventing access to the upper 512 KB.[50][50][50] Protected mode, introduced with the Intel 80386 processor, expands the address space to 4 GB using 32-bit linear addresses and introduces robust memory protection through segmentation and paging. It employs a ring-based privilege system with four levels (rings 0 through 3), where ring 0 denotes kernel-level access and ring 3 user-level, enforced via the current privilege level (CPL) to prevent unauthorized operations. Paging, when enabled, divides memory into 4 KB pages, supporting virtual memory and further isolation. This mode forms the basis for modern multitasking operating systems by isolating processes and protecting system resources.[50][50][50][50] Long mode, the native 64-bit extension of protected mode, provides a flat 64-bit addressing model with a virtual address space of up to 2^48 bytes (256 TiB) and a physical address space of up to 2^52 bytes (4 PB) in modern implementations (original specification supported 40 bits or 1 TB).[50] It requires paging to be active and supports compatibility sub-mode for executing 32-bit protected mode code within a 64-bit environment. Privilege rings remain the same as in protected mode, ensuring continuity for operating system ports. The A20 line is ignored in this mode, as addressing no longer relies on segmented 20-bit limits.[50][50][50] Additional specialized modes include virtual 8086 (VM86) mode and system management mode (SMM). VM86 mode allows protected mode to emulate real mode for running 16-bit DOS applications, confining each task to a 1 MB address space while operating at ring 3 under supervision from ring 0 via a virtual machine monitor. It supports paging if enabled globally and emulates the A20 line through software control. SMM provides a transparent, high-privilege environment for power management and hardware error handling, entered via a system management interrupt (SMI) and using isolated system management RAM (SMRAM) separate from main memory. Paging and the A20 line are disabled upon SMM entry, and it operates outside normal privilege rings with hardware-enforced isolation.[50][50][50][50][50][50] Transitions between modes are controlled primarily through control register 0 (CR0) bits and specific instructions. Setting the PE bit in CR0 enables protected mode from real mode, while clearing it reverts to real mode; however, this requires careful segment descriptor reloading to avoid faults. Paging is activated by setting the PG bit in CR0 after PE is enabled, optionally with physical address extension (PAE) for larger addresses. Entering long mode from protected mode involves enabling PAE via CR4, setting the long-mode enable (LME) bit in the extended feature enable register (EFER), and then setting PG. VM86 mode is entered by setting the VM flag in EFLAGS during a task switch or interrupt return (IRET), and SMM transitions occur automatically via SMI, with exit via the resume (RSM) instruction restoring the prior state. Task switches, supported in protected and VM86 modes via task state segments (TSS), facilitate context changes but are deprecated in long mode in favor of software-managed scheduling.[50][50][50][50][50][50][50]Extensions
Mathematical and Vector Processing Extensions
The x87 floating-point unit (FPU), initially implemented as the separate 8087 coprocessor, was introduced by Intel in 1980 to accelerate numerical computations on the 8086 processor.[51] It features eight 80-bit stack-based registers (ST0 through ST7) that support single-precision (32-bit), double-precision (64-bit), and extended-precision (80-bit) floating-point formats, enabling operations like addition, multiplication, and transcendental functions with high accuracy for scientific and engineering applications.[2] The stack architecture allows efficient operand handling, where ST0 serves as the top of the stack, and instructions implicitly use register positions relative to it, though later integrations into the CPU core from the 80486 onward made the coprocessor optional.[2] In 1996, Intel extended the x87 FPU registers for integer SIMD processing with MMX technology, introducing 57 instructions for 64-bit packed data types including 8-bit, 16-bit, and 32-bit integers.[52] These operations, such as parallel additions and multiplications, repurposed the eight 64-bit portions of the x87 registers (aliased as MM0 through MM7) to boost multimedia workloads like video encoding and image processing without requiring additional hardware.[52] MMX emphasized saturation arithmetic to prevent overflow in signal processing, marking the first SIMD extension in the x86 architecture and paving the way for broader vectorization.[52] Building on MMX, Intel introduced Streaming SIMD Extensions (SSE) in 1999 with the Pentium III processor, adding dedicated 128-bit XMM registers (XMM0 through XMM7) for single-precision floating-point and integer operations.[52] SSE provided 70 new instructions for packed and scalar single-precision floats, enabling four-way parallelism for tasks in 3D graphics, video processing, and scientific simulations, while introducing cache prefetch hints to optimize data movement.[52] SSE2 followed in 2001 with the Pentium 4, expanding to 144 instructions that included double-precision floating-point and 128-bit packed integers across the same XMM registers, supporting two-way double-precision parallelism for enhanced numerical accuracy in applications like fluid dynamics.[52] AMD responded to MMX with 3DNow! in May 1998, implemented in the K6-2 processor, which added 21 SIMD instructions for packed single-precision floating-point operations using the MMX registers.[53] These instructions, such as PFADD for parallel addition and PFMUL for multiplication, targeted 3D graphics and multimedia acceleration by processing two 32-bit floats per 64-bit register, with approximations for reciprocals and square roots via Newton-Raphson iteration to improve performance over scalar methods.[53] The extension included utility instructions like FEMMS for faster transitions between MMX and x87 modes and PREFETCH for cache optimization; it was later enhanced in 1999 with 3DNow!+ on Athlon processors, adding five instructions including PREFETCHW for write hints and support for streaming SIMD operations in digital signal processing.[53] Intel advanced vector processing further with Advanced Vector Extensions (AVX) in 2011, introducing 256-bit YMM registers (YMM0 through YMM15) and a VEX encoding scheme to support wider SIMD operations on single- and double-precision floats and integers.[52] AVX enabled eight single-precision or four double-precision operations per instruction, doubling throughput for compute-intensive workloads like matrix multiplications in machine learning, while preserving compatibility with 128-bit SSE via upper register halves.[52] AVX-512, specified in 2013 and first implemented in Xeon processors in 2017, extended this to 512-bit ZMM registers (ZMM0 through ZMM31) with over 1,000 instructions for packed floats, integers, and advanced math functions, allowing 16 single-precision or eight double-precision elements per vector to accelerate high-performance computing tasks such as simulations and data analytics.[54] A key innovation in AVX-512 is the EVEX encoding prefix, which enables flexible vector lengths (128, 256, or 512 bits) and introduces eight 64-bit mask registers (k0 through k7) for predication, allowing conditional execution within vectors to avoid branching and improve efficiency in sparse computations.[55] Mask registers support merging (zeroing non-masked elements) or zeroing modes, with k0 usable as a full mask, enhancing control flow in algorithms like neural network training where only active elements need processing.[55] This predication, combined with broadcast and gather/scatter operations, reduces overhead in irregular data patterns common in vectorized numerical code.[55]Memory and Addressing Extensions
The Physical Address Extension (PAE), introduced by Intel in 1995 with the Pentium Pro processor, enables 32-bit processors to access up to 64 GB of physical memory through 36-bit physical addressing.[56] In 32-bit protected mode, PAE employs a 4-level paging hierarchy consisting of a page directory pointer table (PDPT), page directory (PD), page table (PT), and page, where the CR3 register points to the PDPT containing four PDPTEs to support the extended address space.[56] This structure is activated by setting the PAE bit (bit 5) in the CR4 register, allowing linear-to-physical address translation beyond the standard 32-bit limit while maintaining compatibility with existing 32-bit operating systems.[56] Page Size Extensions (PSE and PSE-36), first implemented in the Intel Pentium processor in 1993 and fully documented with the Pentium Pro, permit the use of 4 MB pages in addition to the default 4 KB pages to alleviate Translation Lookaside Buffer (TLB) pressure.[56] PSE is enabled by setting the PSE bit (bit 4) in CR4, with the page size (PS) bit (bit 7) in a page directory entry (PDE) indicating a 4 MB page when set, which maps larger memory regions and reduces the number of TLB entries required for address translation.[56] PSE-36 extends this capability to 36-bit physical addressing for 4 MB pages, further optimizing memory management in PAE-enabled systems by supporting up to 128 GB in certain configurations without full 4-level paging overhead.[56] The No-Execute (NX) bit, introduced by AMD in 2003 as part of the AMD64 architecture and later adopted by Intel, provides page-level protection to prevent execution of code from data-only memory regions, enhancing security against exploits like buffer overflows.[56] Implemented as bit 63 in paging structure entries such as page table entries (PTEs), page directory entries (PDEs), and PDPTEs, the NX bit is enabled via the NXE bit (bit 11) in the IA32_EFER MSR (address 0xC0000080) and requires PAE paging (CR4.PAE=1).[56] When set (NX=1), it disables instruction fetches from the page, triggering a page-fault exception (#PF, interrupt 14) on execution attempts, while NX=0 permits execution; hardware support is detected via CPUID function 80000001H:EDX bit 20.[56] In the x86-64 memory model, introduced with AMD's AMD64 architecture in 2003 and implemented in Intel processors starting with the Pentium 4, virtual addressing is restricted to 48 bits using canonical form, where the upper 16 bits (63:48) are sign-extended from bit 47 to ensure valid addresses within a 256 TB address space.[47] Non-canonical addresses, where bits 63:48 do not match bit 47, cause general-protection (#GP) or stack-segment (#SS) exceptions to maintain address space integrity.[47] Physical addressing supports up to 52 bits in later implementations, such as those in Intel Xeon and Core i7 processors, allowing access to 4 PB of memory, with a 1-level paging option available via PSE for 4 MB pages to simplify translation in 64-bit mode (IA-32e).[47] Extended Page Tables (EPT), part of Intel's VT-x virtualization technology, support huge pages of 2 MB and 1 GB to optimize memory translation in virtual machines by reducing page table walks and TLB misses.[57] EPT translates guest-physical addresses to host-physical addresses using a separate paging hierarchy (up to 5 levels), enabled by the "enable EPT" control (bit 1 in secondary processor-based VM-execution controls) and configured via the EPTP field in the VMCS, which specifies the EPT PML4 base and page-walk length.[57] For 2 MB pages, bit 7 in a PDE is set, using bits 51:21 from the PDE and bits 20:0 from the guest-physical address; for 1 GB pages, bit 7 in a PDPTE is set, using bits 51:30 from the PDPTE and bits 29:0 from the address, both alongside 4 KB support to minimize virtualization overhead in large-memory workloads.[57]64-bit and Compatibility Extensions
The x86-64 instruction set architecture (ISA), initially specified by AMD as AMD64, extends the 32-bit x86 ISA to support 64-bit addressing and computation while preserving compatibility with existing software ecosystems. This architecture doubles the number of general-purpose registers (GPRs) available in 64-bit mode to 16, introducing R8 through R15 as new 64-bit registers that are accessed using the REX prefix on instructions. These additional GPRs, along with their 32-bit (R8D–R15D), 16-bit (R8W–R15W), and 8-bit (R8B–R15B) subregisters, reduce register spilling in compilers and enhance performance for data-intensive 64-bit applications by providing more flexibility for temporary values and loop counters. AMD64 also incorporates specialized instructions for efficient system-level operations and synchronization. The SYSCALL instruction facilitates fast transitions from user mode to kernel mode by saving the return instruction pointer (RIP) to RCX and the flags (RFLAGS) to R11, then loading the kernel entry point from the IA32_LSTAR model-specific register (MSR), all without segment-based privilege checks. Its counterpart, SYSRET, reverses this process for returning to user mode by restoring RIP from RCX and RFLAGS from R11, enabling low-latency system calls essential for modern operating systems. For atomic operations, CMPXCHG16B performs a 128-bit compare-and-exchange on a memory location using RDX:RAX for comparison and RBX:RCX for exchange, setting the zero flag (ZF) if they match and ensuring thread-safe updates when prefixed with LOCK, which is particularly valuable for lock-free data structures in 64-bit multithreaded environments. Intel implemented a compatible version known as Intel 64, with additional extensions to broaden utility. Support for the LAHF (load AH from flags) and SAHF (store AH into flags) instructions in 64-bit mode allows direct manipulation of the lower eight bits of the RFLAGS register (SF, ZF, AF, PF, and CF) into the AH register, requiring the CPUID feature flag LAHF_SAHF to be set; this enables legacy code relying on these instructions to function seamlessly without emulation overhead.[58] In 2008, Intel introduced the POPCNT instruction as part of SSE4.2, which counts the number of set bits (population count) in a 32-bit or 64-bit operand and stores the result in the destination register, clearing most flags except ZF (set if the source is zero); this accelerates bitwise algorithms like Hamming weights in cryptography and compression.[58] Compatibility with prior x86 generations is maintained through structured operating modes within the IA-32e paging mode. Legacy mode fully emulates the 32-bit protected mode environment, including segment descriptors and 32-bit addressing, allowing unmodified IA-32 applications to execute as if on a 32-bit processor.[58] The IA-32e mode further subdivides into a compatibility sub-mode, where 32-bit code segments (with the L-bit clear in the code descriptor) run under a 64-bit operating system, supporting legacy SSE instructions on the first eight XMM registers while enforcing 32-bit default operand and address sizes, with transitions to 64-bit mode possible via far calls to segments with the L-bit set.[58] The 64-bit application binary interface (ABI), as defined in the System V ABI for AMD64, introduces differences from 32-bit conventions to optimize for the expanded register set and larger address space. Integer and pointer parameters are passed in registers RDI, RSI, RDX, RCX, R8, and R9 (up to six arguments), with floating-point values in XMM0 through XMM7, spilling to the stack only for excess arguments pushed right-to-left; this register-based passing minimizes memory accesses compared to the stack-heavy 32-bit System V ABI.[59] A 128-byte "red zone" immediately below the stack pointer (RSP) serves as scratch space for leaf functions, which can allocate it without explicit stack adjustment, though signal handlers and interrupts do not preserve it, requiring compiler awareness (e.g., via -mno-red-zone for kernel code).[59] Stack frames must maintain 16-byte alignment on function entry (32 bytes if passing 256-bit vectors), with callee-saved registers including RBX, RBP, and R12–R15, while caller-saved registers like RAX–R11 and XMM0–XMM15 handle temporaries; return values use RAX/RDX for integers or XMM0/XMM1 for floats.[59] Implementations of x86-64 processors incorporate branch prediction enhancements tailored to 64-bit workloads, such as larger history tables and improved indirect branch prediction to manage the expanded control flow possibilities from 64-bit RIP-relative addressing and more registers, reducing misprediction penalties in performance-critical code.[60]Security and Virtualization Extensions
x86 security and virtualization extensions provide hardware support for protecting sensitive data and enabling efficient virtual machine execution, addressing vulnerabilities in shared environments and supporting trusted computing paradigms. These features, developed primarily by Intel and AMD, include virtualization technologies that allow multiple operating systems to run securely on a single processor, as well as cryptographic accelerations and memory protection mechanisms to mitigate attacks like side-channel exploits and unauthorized memory access.[61][62] Intel introduced Virtualization Technology (VT-x) in 2005 to accelerate virtual machine monitors (VMMs) on IA-32 and Intel 64 architectures. VT-x operates in two modes: VMX root mode for the VMM with full privileges and VMX non-root mode for guest software with restricted access to sensitive instructions. Transitions occur via VM-entry, which loads guest state from the virtual-machine control structure (VMCS) using VMLAUNCH or VMRESUME, and VM-exit, which saves guest state and returns control to the VMM upon events like interrupts or exceptions. This hardware assistance reduces the overhead of software-based virtualization by handling mode switches directly in the processor.[61] AMD responded with AMD-V in 2006, providing comparable hardware virtualization support through Secure Virtual Machine (SVM) mode. SVM enables efficient guest execution similar to VT-x, with the VMM managing virtual machines via dedicated instructions. A key feature is Nested Page Tables (NPT), which implements two-level address translation to accelerate memory virtualization and reduce VMM involvement in page faults. NPT allows the processor to walk both guest and nested page tables in hardware, improving performance for memory-intensive workloads in virtualized environments.[63] For cryptographic security, Intel's Advanced Encryption Standard New Instructions (AES-NI), launched in 2008, accelerate AES operations critical for data protection. AES-NI includes instructions such as AESENC and AESDEC for performing encryption and decryption rounds, respectively, handling operations like ShiftRows, SubBytes, and AddRoundKey in a single cycle. Key expansion is supported by AESKEYGENASSIST for generating round keys and AESIMC for inverse mixing, enabling up to 10x performance gains in bulk encryption modes like CBC or GCM while minimizing timing side-channel vulnerabilities.[64] Intel Trusted Execution Technology (TXT), introduced around 2006, enhances platform security through measured launch mechanisms for trusted computing. TXT uses a dynamic root of trust to verify the integrity of the BIOS, hypervisor, and OS at launch, employing cryptographic measurements stored in a Trusted Platform Module (TPM). Integrated with VT-x, it creates protected execution environments that prevent malware from compromising the launch process, supporting secure virtual machine isolation against rootkits and firmware attacks.[65] AMD's Secure Virtual Machine (SVM), part of AMD-V, extends virtualization with trusted computing support similar to TXT, enabling measured launches and attested boots. SVM Lock functionality protects VMCS data from unauthorized access, ensuring the integrity of virtual machine configurations.[63] In 2017, AMD introduced Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) to protect against memory-based attacks in virtualized setups. SME uses a single system-wide key, generated by the AMD Secure Processor, to encrypt all physical memory transparently, defending against physical attacks like cold-boot exploits. SEV extends this with per-VM keys, encrypting guest memory to prevent the hypervisor or host OS from accessing or leaking VM data, thus isolating guests from malicious hosts and reducing risks in cloud environments. SEV requires coordination between the hypervisor and guest OS for key provisioning via the Secure Processor.[62]Recent Developments and Future Extensions
In the 2020s, x86 architectures have seen significant enhancements focused on artificial intelligence workloads, building on prior vector processing capabilities to deliver greater efficiency and performance. Intel's Deep Learning Boost (DL Boost), incorporating Vector Neural Network Instructions (VNNI) and bfloat16 (BF16) support within AVX-512, has been expanded for broader adoption in AI training and inference, enabling faster low-precision computations essential for deep learning models. These features, integrated into processors like the 4th-generation Intel Xeon Scalable (Sapphire Rapids) released in 2023, provide up to 2x throughput improvements for INT8 and BF16 operations compared to standard AVX-512, as demonstrated in neural network benchmarks.[66][67] A major advancement came in 2023 with Intel's announcement of Advanced Performance Extensions (APX), which augments the x86-64 instruction set by increasing general-purpose registers from 16 to 32, introducing new instructions for conditional operations, and optimizing register renaming to reduce spills in software stacks. APX aims to boost general-purpose performance by 10-20% in integer-heavy workloads without significant increases in power consumption or die area, and it is slated for integration into future Intel processors, including Nova Lake expected in 2026, on the 18A process node and beyond. This extension also enhances compatibility with existing x86 code, facilitating smoother transitions for legacy applications.[68][69] On the AMD side, the introduction of the XDNA 2 neural processing unit (NPU) architecture in the Ryzen AI Max PRO series processors, unveiled at CES 2025, marks a dedicated hardware acceleration path for x86-based AI tasks at the edge. These processors deliver up to 50 TOPS of AI performance through the NPU, combined with Zen 5 CPU cores and RDNA 3.5 integrated graphics, targeting efficient on-device inference for generative AI and machine learning in laptops and workstations. The XDNA 2 design emphasizes low-power operation, achieving up to 126 TOPS system-wide while maintaining compatibility with x86 software ecosystems.[70][71] Collaborative efforts have further shaped x86's trajectory, exemplified by the formation of the x86 Ecosystem Advisory Group (EAG) in October 2024 by Intel and AMD, in partnership with companies like Arm Holdings, Meta, and Microsoft. The EAG focuses on enhancing cross-platform interoperability, particularly addressing ARM's rise by standardizing x86 extensions for software portability and simplifying development across heterogeneous environments. By October 2025, the group reported progress on initiatives like AVX10, a next-generation vector extension that refines 512-bit operations for higher throughput in AI and HPC, while ensuring backward compatibility. In November 2025, Intel confirmed support for AVX10.2 and APX in the upcoming Nova Lake processors, advancing vector processing for AI and high-performance computing.[44][72][73] In September 2025, Intel and NVIDIA announced a multi-year collaboration to develop x86-based system-on-chips (SoCs) integrating NVIDIA RTX GPU chiplets, targeting AI-optimized personal computing and datacenter infrastructure. These SoCs are under development as part of a multi-year collaboration, combining x86 cores with NVIDIA's CUDA ecosystem to accelerate AI workloads. This partnership underscores x86's pivot toward hybrid architectures for edge and cloud AI.[45] Looking ahead, x86 extensions are poised to prioritize power efficiency for edge AI deployments, with APX and AVX10 enabling sub-10W operations in mobile scenarios while scaling to datacenter demands. Ongoing EAG efforts suggest further innovations in vector widths and instruction fusion to compete with ARM's efficiency, potentially incorporating wider data paths beyond 512 bits in post-2025 iterations, though specifics remain under development as of late 2025.[74][75]Implementations
Modern Hardware Implementations
Intel's Core Ultra processors represent a cornerstone of modern x86 hardware, emphasizing hybrid core designs and integrated AI acceleration. The Meteor Lake generation, introduced in late 2023, features up to 16 cores (combining performance, efficient, and low-power efficient types) and marks the debut of a dedicated Neural Processing Unit (NPU) delivering up to 11 TOPS for AI tasks, with total platform AI performance up to 34 TOPS, enabling efficient on-device processing without relying solely on CPU or GPU resources.[76] Built on Intel's 4 process node, Meteor Lake operates in power envelopes from 15W to 55W for mobile variants, supporting AVX-512 instructions for advanced vector computations.[77] The Arrow Lake architecture, released in 2024 for desktop and mobile platforms, advances this lineage with refined hybrid cores—up to 24 in high-end models—and enhanced NPU performance reaching 13 TOPS, while maintaining compatibility with x86-64 extensions. Fabricated using TSMC's N3B process for the compute tile and Intel's processes for other components, Arrow Lake-S desktop SKUs target 65W to 125W TDPs, delivering balanced multi-threaded performance suitable for productivity and content creation workloads.[78] It supports full AVX-512 utilization, allowing developers to leverage wide vector operations for AI and scientific computing.[79] Panther Lake, Intel's 2025 flagship on the 18A process node, integrates up to 16 performance-cores and efficient-cores in a disaggregated tile-based design, achieving over 50% gains in CPU and GPU performance relative to Arrow Lake and Lunar Lake equivalents. The 18A node employs RibbonFET gate-all-around transistors and PowerVia backside power delivery, yielding up to 15% better performance per watt and over 30% transistor density improvement compared to prior nodes like Intel 3. This enables sustained operation at lunar lake-level efficiency (around 15-28W for mobile) while scaling to 125W for desktops, with NPU enhancements pushing total platform AI capability beyond 100 TOPS.[80][81][82] AMD's Ryzen lineup, powered by the Zen 5 microarchitecture since 2024, emphasizes high instructions-per-clock (IPC) uplifts and chiplet scalability for x86 dominance in consumer and enterprise segments. Desktop Ryzen 9000 series processors offer up to 16 Zen 5 cores with a 16% IPC increase over Zen 4 and support for DDR5-5600 memory across 65-170W TDPs. Fabricated on TSMC's N4P (4nm-class) process, these chips excel in multi-threaded scenarios, such as rendering and simulation; mobile variants include integrated XDNA AI engines providing up to 50 TOPS for inference.[34][83] Zen 6 previews from 2025 highlight a shift to TSMC's 2nm node for core complex dies (CCDs) and 3nm for the I/O die, promising further density and efficiency advances while retaining x86 compatibility and XDNA 2 AI acceleration. This architecture targets similar power envelopes but with enhanced per-core performance for AI-driven applications. In the server domain, the EPYC 9005 series—launched in October 2024 with embedded variants following in 2025—scales to 192 Zen 5 or Zen 5c cores in a modular chiplet design, supporting up to 12 DDR5-6000 channels and TDPs from 155W to 500W for data center use. These processors deliver up to 17% better IPC in AI and virtualization workloads compared to prior generations.[84][85][86] Beyond Intel and AMD, pure x86 implementations remain limited, with VIA's Nano series—last updated in the mid-2010s—offering low-power, embedded options but lacking recent advancements or broad adoption in 2025. Qualcomm's Snapdragon X Elite, introduced in 2024 as an ARM-based platform, incorporates x86 emulation via Microsoft's Prism layer, achieving 80-90% native performance for many Windows applications on up to 12 Oryon cores, though it diverges from native x86 silicon.[87][88] Modern x86 processors from Intel and AMD predominantly utilize leading-edge process nodes: Intel's 18A (1.8nm-class) for 2025 designs and TSMC's 3nm family for AMD's upcoming Zen 6 components, alongside 4nm for current Zen 5, enabling transistor densities approaching 250 million per mm² and power efficiency critical for AI PCs. Desktop configurations typically span 15W to 125W TDPs, balancing mobility and performance.[81][84][89] In benchmarks, these implementations demonstrate strong multi-threaded scaling. For instance, Intel's Arrow Lake Core Ultra 9 285K achieves Cinebench R23 multi-core scores around 42,000, while AMD's Ryzen 9 9950X (Zen 5) surpasses 42,800, highlighting Zen 5's edge in threaded rendering. SPECint 2017 results for EPYC 9005 show up to 20% multi-threaded integer throughput gains over Intel's Xeon 6 series, underscoring x86's prowess in server virtualization.[85]| Processor | Cores/Threads | Process Node | TDP Range (Desktop) | Representative Benchmark (Cinebench R23 Multi-Core) |
|---|---|---|---|---|
| Intel Core Ultra (Arrow Lake) | Up to 24/32 | TSMC N3B + Intel | 65-125W | ~42,000 |
| AMD Ryzen 9 (Zen 5) | 16/32 | TSMC N4P | 65-170W | ~42,800 |
| AMD EPYC 9965 (Zen 5) | 192/384 | TSMC N4P | 400-500W | N/A (Server; SPECint ~1,500 rate multi-threaded (single-socket))[83] |
Software Support and Ecosystem
The x86 architecture enjoys broad operating system compatibility, serving as the foundation for major platforms. Microsoft Windows, utilizing the NT kernel, provides native support for both 32-bit x86 and 64-bit x86-64 modes across its versions, enabling seamless execution of applications on compatible hardware.[90] Linux distributions universally support x86 and x86-64 through the kernel's architecture-specific code, allowing deployment on everything from embedded systems to servers.[91] For legacy software, such as MS-DOS applications, emulators like DOSBox replicate the x86 environment on modern systems, preserving compatibility without native hardware.[92] Compilers and development tools form a robust ecosystem for x86 programming. The GNU Compiler Collection (GCC) includes dedicated backends for x86 targets, generating optimized machine code via options like -m32 or -m64, while supporting inline assembly for low-level performance tweaks.[93] Similarly, the LLVM project, powering Clang, offers comprehensive x86 backends that enable cross-compilation and advanced optimizations, integrating seamlessly with build systems like CMake.[94] These tools facilitate efficient software development, from kernel modules to user-space applications, leveraging x86's instruction set for high-performance computing. Emulation layers extend x86's reach beyond native hardware. Apple's Rosetta 2 translates x86-64 binaries to ARM code on Apple silicon Macs, allowing Intel-based macOS apps to run with minimal overhead on M-series chips.[95] Wine provides a compatibility layer for executing Windows x86 executables on Linux and other POSIX systems, bridging OS boundaries without full virtualization.[96] The x86 ecosystem faces ongoing challenges, particularly in security and evolution. Software mitigations for Spectre and Meltdown vulnerabilities, disclosed in 2018, involve kernel patches, compiler barriers, and runtime checks to counter speculative execution exploits, implemented across OSes like Windows and Linux.[97] Emerging extensions like Intel's Advanced Performance Extensions (APX) require updates to compilers, ABIs, and operating systems to utilize additional registers and instructions, ensuring future-proofing without breaking existing code.[98] Standardized application binary interfaces (ABIs) and debugging tools underpin reliable development. Linux adheres to the System V ABI for x86-64, defining calling conventions, stack alignment, and data passing to ensure portability across distributions.[99] Windows employs the Microsoft x64 ABI, which specifies register usage and shadow space for function calls, promoting interoperability in mixed-language environments.[100] The GNU Debugger (GDB) supports x86 debugging with features like breakpoints, register inspection, and disassembly, aiding developers in troubleshooting assembly and high-level code.[101]References
- https://en.wikichip.org/wiki/amd/am8086
- https://en.wikichip.org/wiki/amd/microarchitectures/zen
