Hubbry Logo
X86-64X86-64Main
Open search
X86-64
Community hub
X86-64
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
X86-64
X86-64
from Wikipedia

AMD Opteron, the first CPU to introduce the x86-64 extensions in April 2003
The five-volume set of the x86-64 Architecture Programmer's Manual, as published and distributed by AMD in 2002

x86-64 (also known as x64, x86_64, AMD64, and Intel 64)[note 1] is a 64-bit extension of the x86 instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new operating modes: 64-bit mode and compatibility mode, along with a new four-level paging mechanism.

In 64-bit mode, x86-64 supports significantly larger amounts of virtual memory and physical memory compared to its 32-bit predecessors, allowing programs to utilize more memory for data storage. The architecture expands the number of general-purpose registers from 8 to 16, all fully general-purpose, and extends their width to 64 bits.

Floating-point arithmetic is supported through mandatory SSE2 instructions in 64-bit mode. While the older x87 FPU and MMX registers are still available, they are generally superseded by a set of sixteen 128-bit vector registers (XMM registers). Each of these vector registers can store one or two double-precision floating-point numbers, up to four single-precision floating-point numbers, or various integer formats.

In 64-bit mode, instructions are modified to support 64-bit operands and 64-bit addressing mode.

The x86-64 architecture defines a compatibility mode that allows 16-bit and 32-bit user applications to run unmodified alongside 64-bit applications, provided the 64-bit operating system supports them.[11][note 2] Since the full x86-32 instruction sets remain implemented in hardware without the need for emulation, these older executables can run with little or no performance penalty,[13] while newer or modified applications can take advantage of new features of the processor design to achieve performance improvements. Also, processors supporting x86-64 still power on in real mode to maintain backward compatibility with the original 8086 processor, as has been the case with x86 processors since the introduction of protected mode with the 80286.

The original specification, created by AMD and released in 2000, has been implemented by AMD, Intel, and VIA. The AMD K8 microarchitecture, in the Opteron and Athlon 64 processors, was the first to implement it. This was the first significant addition to the x86 architecture designed by a company other than Intel. Intel was forced to follow suit and introduced a modified NetBurst family which was software-compatible with AMD's specification. VIA Technologies introduced x86-64 in their VIA Isaiah architecture, with the VIA Nano.

The x86-64 architecture was quickly adopted for desktop and laptop personal computers and servers which were commonly configured for 16 GiB (gibibytes) of memory or more. It has effectively replaced the discontinued Intel Itanium architecture (formerly IA-64), which was originally intended to replace the x86 architecture. x86-64 and Itanium are not compatible on the native instruction set level, and operating systems and applications compiled for one architecture cannot be run on the other natively.

AMD64

[edit]
AMD64 logo

History

[edit]

AMD64 (also variously referred to by AMD in their literature and documentation as "AMD 64-bit Technology" and "AMD x86-64 Architecture") was created as an alternative to the radically different IA-64 architecture designed by Intel and Hewlett-Packard, which was backward-incompatible with IA-32, the 32-bit version of the x86 architecture. AMD originally announced AMD64 in 1999[14] with a full specification available in August 2000.[15] As AMD was never invited to be a contributing party for the IA-64 architecture and any kind of licensing seemed unlikely, the AMD64 architecture was positioned by AMD from the beginning as an evolutionary way to add 64-bit computing capabilities to the existing x86 architecture while supporting legacy 32-bit x86 code, as opposed to Intel's approach of creating an entirely new, completely x86-incompatible 64-bit architecture with IA-64.

The first AMD64-based processor, the Opteron, was released in April 2003.

Implementations

[edit]

AMD's processors implementing the AMD64 architecture include Opteron, Athlon 64, Athlon 64 X2, Athlon 64 FX, Athlon II (followed by "X2", "X3", or "X4" to indicate the number of cores, and XLT models), Turion 64, Turion 64 X2, Sempron ("Palermo" E6 stepping and all "Manila" models), Phenom (followed by "X3" or "X4" to indicate the number of cores), Phenom II (followed by "X2", "X3", "X4" or "X6" to indicate the number of cores), FX, Fusion/APU and Ryzen/Epyc.

Architectural features

[edit]

The primary defining characteristic of AMD64 is the availability of 64-bit general-purpose processor registers (for example, rax), 64-bit integer arithmetic and logical operations, and 64-bit virtual addresses.[16] The designers took the opportunity to make other improvements as well.

Notable changes in the 64-bit extensions include:

64-bit integer capability
All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc., can operate directly on 64-bit integers. Pushes and pops on the stack default to 8-byte strides, and pointers are 8 bytes wide.
Additional registers
In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. eax, ebx, ecx, edx, esi, edi, esp, ebp) in x86 to 16 (i.e. rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp, r8, r9, r10, r11, r12, r13, r14, r15). It is therefore possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent.
AMD64 still has fewer registers than many RISC instruction sets (e.g. Power ISA has 32 GPRs; 64-bit ARM, RISC-V I, SPARC, Alpha, MIPS, and PA-RISC have 31) or VLIW-like machines such as the IA-64 (which has 128 registers). However, an AMD64 implementation may have far more internal registers than the number of architectural registers exposed by the instruction set (see register renaming). (For example, AMD Zen cores have 168 64-bit integer and 160 128-bit vector floating-point physical internal registers.)
Additional XMM (SSE) registers
Similarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16.
The traditional x87 FPU register stack is not included in the register file size extension in 64-bit mode, compared with the XMM registers used by SSE2, which did get extended. The x87 register stack is not a simple register file although it does allow direct access to individual registers by low cost exchange operations.
Larger virtual address space
The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations.[11]: 120  This allows up to 256 TiB (248 bytes) of virtual address space. The architecture definition allows this limit to be raised in future implementations to the full 64 bits,[11]: 2 : 3 : 13 : 117 : 120  extending the virtual address space to 16 EiB (264 bytes).[17] This is compared to just 4 GiB (232 bytes) for the x86.[18]
This means that very large files can be operated on by mapping the entire file into the process's address space (which is often much faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.
Larger physical address space
The original implementation of the AMD64 architecture implemented 40-bit physical addresses and so could address up to 1 TiB (240 bytes) of RAM.[11]: 24  Current implementations of the AMD64 architecture (starting from AMD 10h microarchitecture) extend this to 48-bit physical addresses[19] and therefore can address up to 256 TiB (248 bytes) of RAM. The architecture permits extending this to 52 bits in the future[11]: 24 [20] (limited by the page table entry format);[11]: 131  this would allow addressing of up to 4 PiB of RAM. For comparison, 32-bit x86 processors are limited to 64 GiB of RAM in Physical Address Extension (PAE) mode,[21] or 4 GiB of RAM without PAE mode.[11]: 4 
Larger physical address space in legacy mode
When operating in legacy mode the AMD64 architecture supports Physical Address Extension (PAE) mode, as do most current x86 processors, but AMD64 extends PAE from 36 bits to an architectural limit of 52 bits of physical address. Any implementation, therefore, allows the same physical address limit as under long mode.[11]: 24 
Instruction pointer relative data access
Instructions can now reference data relative to the instruction pointer (RIP register). This makes position-independent code, as is often used in shared libraries and code loaded at run time, more efficient.
SSE instructions
The original AMD64 architecture adopted Intel's SSE and SSE2 as core instructions. These instruction sets provide a vector supplement to the scalar x87 FPU, for the single-precision and double-precision data types. SSE2 also offers integer vector operations, for data types ranging from 8bit to 64bit precision. This makes the vector capabilities of the architecture on par with those of the most advanced x86 processors of its time. These instructions can also be used in 32-bit mode. The proliferation of 64-bit processors has made these vector capabilities ubiquitous in home computers, allowing the improvement of the standards of 32-bit applications. The 32-bit edition of Windows 8, for example, requires the presence of SSE2 instructions.[22] SSE3 instructions and later Streaming SIMD Extensions instruction sets are not standard features of the architecture.
No-Execute bit
The No-Execute bit or NX bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "buffer overrun" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the 80286 as an attribute of segment descriptors; however, this works only on an entire segment at a time.
Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of zero and (in their 32-bit implementation) a size of 4 GiB. AMD was the first x86-family vendor to implement no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used.
Removal of older features
A few "system programming" features of the x86 architecture were either unused or underused in modern operating systems and are either not available on AMD64 in long (64-bit and compatibility) mode, or exist only in limited form. These include segmented addressing (although the FS and GS segments are retained in vestigial form for use as extra-base pointers to operating system structures),[11]: 70  the task state switch mechanism, and virtual 8086 mode. These features remain fully implemented in "legacy mode", allowing these processors to run 32-bit and 16-bit operating systems without modifications. Some instructions that proved to be rarely useful are not supported in 64-bit mode, including saving/restoring of segment registers on the stack, saving/restoring of all registers (PUSHA/POPA), decimal arithmetic, BOUND and INTO instructions, and "far" jumps and calls with immediate operands.

Virtual address space details

[edit]

Canonical form addresses

[edit]
Canonical address space implementations (diagrams not to scale)
Current 48-bit implementation
57-bit implementation
64-bit implementation

Although virtual addresses are 64 bits wide in 64-bit mode, current implementations (and all chips that are known to be in the planning stages) do not allow the entire virtual address space of 264 bytes (16 EiB) to be used. This would be approximately four billion times the size of the virtual address space on 32-bit machines. Most operating systems and applications will not need such a large address space for the foreseeable future, so implementing such wide virtual addresses would simply increase the complexity and cost of address translation with no real benefit. AMD, therefore, decided that, in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup).[11]: 120 

In addition, the AMD specification requires that the most significant 16 bits of any virtual address, bits 48 through 63, must be copies of bit 47 (in a manner akin to sign extension). If this requirement is not met, the processor will raise an exception.[11]: 131  Addresses complying with this rule are referred to as "canonical form."[11]: 130  Canonical form addresses run from 0 through 00007FFF'FFFFFFFF, and from FFFF8000'00000000 through FFFFFFFF'FFFFFFFF, for a total of 256 TiB of usable virtual address space. This is still 65,536 times larger than the virtual 4 GiB address space of 32-bit machines.

This feature eases later scalability to true 64-bit addressing. Many operating systems (including, but not limited to, the Windows NT family) take the higher-addressed half of the address space (named kernel space) for themselves and leave the lower-addressed half (user space) for application code, user mode stacks, heaps, and other data regions.[23] The "canonical address" design ensures that every AMD64 compliant implementation has, in effect, two memory halves: the lower half starts at 00000000'00000000 and "grows upwards" as more virtual address bits become available, while the higher half is "docked" to the top of the address space and grows downwards. Also, enforcing the "canonical form" of addresses by checking the unused address bits prevents their use by the operating system in tagged pointers as flags, privilege markers, etc., as such use could become problematic when the architecture is extended to implement more virtual address bits.

The first versions of Windows for x64 did not even use the full 256 TiB; they were restricted to just 8 TiB of user space and 8 TiB of kernel space.[23] Windows did not support the entire 48-bit address space until Windows 8.1, which was released in October 2013.[23]

Page table structure

[edit]

The 64-bit addressing mode ("long mode") is a superset of Physical Address Extensions (PAE); because of this, page sizes may be 4 KiB (212 bytes) or 2 MiB (221 bytes).[11]: 120  Long mode also supports page sizes of 1 GiB (230 bytes).[11]: 120  Rather than the three-level page table system used by systems in PAE mode, systems running in long mode use four levels of page table: PAE's Page-Directory Pointer Table is extended from four entries to 512, and an additional Page-Map Level 4 (PML4) Table is added, containing 512 entries in 48-bit implementations.[11]: 131  A full mapping hierarchy of 4 KiB pages for the whole 48-bit space would take a bit more than 512 GiB of memory (about 0.195% of the 256 TiB virtual space).

64 bit page table entry
Bits: 63 62 ... 52 51 ... 32
Content: NX reserved Bit 51...32 of base address
Bits: 31 ... 12 11 ... 9 8 7 6 5 4 3 2 1 0
Content: Bit 31...12 of base address ign. G PAT D A PCD PWT U/S R/W P

Intel has implemented a scheme with a 5-level page table, which allows Intel 64 processors to support 57-bit addresses, and in turn, a 128 PiB virtual address space.[24] Further extensions may allow full 64-bit virtual address space and physical memory with 12-bit page table descriptors and 16- or 21-bit memory offsets for 64 KiB and 2 MiB page allocation sizes; the page table entry would be expanded to 128 bits to support additional hardware flags for page size and virtual address space size.[25]

Operating system limits

[edit]

The operating system can also limit the virtual address space. Details, where applicable, are given in the "Operating system compatibility and characteristics" section.

Physical address space details

[edit]

Current AMD64 processors support a physical address space of up to 248 bytes of RAM, or 256 TiB.[19] However, as of 2020, there were no known x86-64 motherboards that support 256 TiB of RAM.[26][27][28][29][failed verification] The operating system may place additional limits on the amount of RAM that is usable or supported. Details on this point are given in the "Operating system compatibility and characteristics" section of this article.

Operating modes

[edit]

The architecture has two primary modes of operation: long mode and legacy mode.

Operating Operating system required Type of code being run Size (in bits) No. of general-purpose registers
Mode Sub-mode Addresses Operands (default in italics)
Long mode 64-bit mode 64-bit OS, 64-bit UEFI firmware, or the previous two interacting via a 64-bit firmware's UEFI interface 64-bit 64 8, 16, 32, 64 16
Compatibility mode Bootloader or 64-bit OS 32-bit 32 8, 16, 32 8
16-bit protected mode 16 8, 16, 32 8
Legacy mode Protected mode Bootloader, 32-bit OS, 32-bit UEFI firmware, or the latter two interacting via the firmware's UEFI interface 32-bit 32 8, 16, 32 8
16-bit protected mode OS 16-bit protected mode 16 8, 16, 32[m 1] 8
Virtual 8086 mode 16-bit protected mode or 32-bit OS subset of real mode 16 8, 16, 32[m 1] 8
Unreal mode Bootloader or real mode OS real mode 16, 20, 32 8, 16, 32[m 1] 8
Real mode Bootloader, real mode OS, or any OS interfacing with a firmware's BIOS interface[30] real mode 16, 20, 21 8, 16, 32[m 1] 8
  1. ^ a b c d Note that 16-bit code written for the 80286 and below does not use 32-bit operand instructions. Code written for the 80386 and above can use the operand-size override prefix (0x66). Normally this prefix is used by protected and long mode code for the purpose of using 16-bit operands, as that code would be running in a code segment with a default operand size of 32 bits. In real mode, the default operand size is 16 bits, so the 0x66 prefix is interpreted differently, changing operand size to 32 bits.
State diagram of the x86-64 operating modes

Long mode

[edit]

Long mode is the architecture's intended primary mode of operation; it is a combination of the processor's native 64-bit mode and a combined 32-bit and 16-bit compatibility mode. It is used by 64-bit operating systems. Under a 64-bit operating system, 64-bit programs run under 64-bit mode, and 32-bit and 16-bit protected mode applications (that do not need to use either real mode or virtual 8086 mode in order to execute at any time) run under compatibility mode. Real-mode programs and programs that use virtual 8086 mode at any time cannot be run in long mode unless those modes are emulated in software.[11]: 11  However, such programs may be started from an operating system running in long mode on processors supporting VT-x or AMD-V by creating a virtual processor running in the desired mode.

Since the basic instruction set is the same, there is almost no performance penalty for executing protected mode x86 code. This is unlike Intel's IA-64, where differences in the underlying instruction set mean that running 32-bit code must be done either in emulation of x86 (making the process slower) or with a dedicated x86 coprocessor. However, on the x86-64 platform, many x86 applications could benefit from a 64-bit recompile, due to the additional registers in 64-bit code and guaranteed SSE2-based FPU support, which a compiler can use for optimization. However, applications that regularly handle integers wider than 32 bits, such as cryptographic algorithms, will need a rewrite of the code handling the huge integers in order to take advantage of the 64-bit registers.

Legacy mode

[edit]

Legacy mode is the mode that the processor is in when it is not in long mode.[11]: 14  In this mode, the processor acts like an older x86 processor, and only 16-bit and 32-bit code can be executed. Legacy mode allows for a maximum of 32 bit virtual addressing which limits the virtual address space to 4 GiB.[11]: 14 : 24 : 118  64-bit programs cannot be run from legacy mode.

Protected mode

[edit]

Protected mode is made into a submode of legacy mode.[11]: 14  It is the submode that 32-bit operating systems and 16-bit protected mode operating systems operate in when running on an x86-64 CPU.[11]: 14 

Real mode

[edit]

Real mode is the initial mode of operation when the processor is initialized, and is a submode of legacy mode. It is backwards compatible with the original Intel 8086 and Intel 8088 processors. Real mode is primarily used today by operating system bootloaders, which are required by the architecture to configure virtual memory details before transitioning to higher modes. This mode is also used by any operating system that needs to communicate with the system firmware with a traditional BIOS-style interface.[30]

Intel 64

[edit]

Intel 64 is Intel's implementation of x86-64, used and implemented in various processors made by Intel.

History

[edit]

Historically, AMD has developed and produced processors with instruction sets patterned after Intel's original designs, but with x86-64, roles were reversed: Intel found itself in the position of adopting the ISA that AMD created as an extension to Intel's own x86 processor line.

Intel's project was originally codenamed Yamhill[31] (after the Yamhill River in Oregon's Willamette Valley). After several years of denying its existence, Intel announced at the February 2004 IDF that the project was indeed underway. Intel's chairman at the time, Craig Barrett, admitted that this was one of their worst-kept secrets.[32][33]

Intel's name for this instruction set has changed several times. The name used at the IDF was CT[34] (presumably[original research?] for Clackamas Technology, another codename from an Oregon river); within weeks they began referring to it as IA-32e (for IA-32 extensions) and in March 2004 unveiled the "official" name EM64T (Extended Memory 64 Technology). In late 2006 Intel began instead using the name Intel 64 for its implementation, paralleling AMD's use of the name AMD64.[35]

The first processor to implement Intel 64 was the multi-socket processor Xeon code-named Nocona in June 2004. In contrast, the initial Prescott chips (February 2004) did not enable this feature. Intel subsequently began selling Intel 64-enabled Pentium 4s using the E0 revision of the Prescott core, being sold on the OEM market as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name for the NX bit) to Intel 64, and has been included in then current Xeon code-named Irwindale. Intel's official launch of Intel 64 (under the name EM64T at that time) in mainstream desktop processors was the N0 stepping Prescott-2M in February 2005.[36]

The first Intel mobile processor implementing Intel 64 is the Merom version of the Core 2 processor, which was released on July 27, 2006. None of Intel's earlier notebook CPUs (Core Duo, Pentium M, Celeron M, Mobile Pentium 4) implement Intel 64.

Implementations

[edit]

Intel's processors implementing the Intel64 architecture include the Pentium 4 F-series/5x1 series, 506, and 516, Celeron D models 3x1, 3x6, 355, 347, 352, 360, and 365 and all later Celerons, all models of Xeon since "Nocona", all models of Pentium Dual-Core processors since "Merom-2M", the Atom 230, 330, D410, D425, D510, D525, N450, N455, N470, N475, N550, N570, N2600 and N2800, all versions of the Pentium D, Pentium Extreme Edition, Core 2, Core i9, Core i7, Core i5, and Core i3 processors, and the Xeon Phi 7200 series processors.

VIA's x86-64 implementation

[edit]

VIA Technologies introduced their first implementation of the x86-64 architecture in 2008 after five years of development by its CPU division, Centaur Technology.[37] Codenamed "Isaiah", the 64-bit architecture was unveiled on January 24, 2008,[38] and launched on May 29 under the VIA Nano brand name.[39]

The processor supports a number of VIA-specific x86 extensions designed to boost efficiency in low-power appliances. It is expected that the Isaiah architecture will be twice as fast in integer performance and four times as fast in floating-point performance as the previous-generation VIA Esther at an equivalent clock speed. Power consumption is also expected to be on par with the previous-generation VIA CPUs, with thermal design power ranging from 5 W to 25 W.[40] Being a completely new design, the Isaiah architecture was built with support for features like the x86-64 instruction set and x86 virtualization which were unavailable on its predecessors, the VIA C7 line, while retaining their encryption extensions.

Microarchitecture levels

[edit]

In 2020, through a collaboration between AMD, Intel, Red Hat, and SUSE, three microarchitecture levels (or feature levels) on top of the x86-64 baseline were defined: x86-64-v2, x86-64-v3, and x86-64-v4.[41][42] These levels define specific features that can be targeted by programmers to provide compile-time optimizations. The features exposed by each level are as follows:[43]

CPU microarchitecture levels
Level name CPU features Example instruction Supported processors
x86-64-v1 CMOV cmov Baseline for all x86-64 CPUs. Features match the common capabilities between the 2003 AMD AMD64 and the 2004 Intel EM64T initial implementations in the AMD K8 and the Intel Prescott processor families.
CX8 cmpxchg8b
FPU fld
FXSR fxsave
MMX emms
OSFXSR fxsave
SCE syscall
SSE cvtss2si
SSE2 cvtpi2pd
x86-64-v2 CMPXCHG16B cmpxchg16b Features match the 2008 Intel Nehalem architecture, excluding Intel-specific instructions.
LAHF-SAHF lahf
POPCNT popcnt
SSE3 addsubpd
SSE4_1 blendpd
SSE4_2 pcmpestri
SSSE3 pshufb
x86-64-v3 AVX vzeroall Features match the 2013 Intel Haswell architecture, excluding Intel-specific instructions.
  • Intel
    • Low-power Gracemont processors (SSE4.1, SSE4.2, POPCNT and LZCNT supported)
    • Haswell and Broadwell processors (SSE4.1, SSE4.2, POPCNT and LZCNT supported)
  • AMD
    • Excavator processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
    • Zen, Zen+, Zen 2, and Zen 3 processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
  • Zhaoxin
    • YongFeng and Shijidadao processors (SSE4.2, AVX2 and FMA3 supported)
AVX2 vpermd
BMI1 andn
BMI2 bzhi
F16C vcvtph2ps
FMA vfmadd132pd
LZCNT lzcnt
MOVBE movbe
OSXSAVE xgetbv
x86-64-v4 AVX512F kmovw Features match the 2017 Intel Skylake-X architecture, excluding Intel-specific instructions.
  • Intel
    • Skylake processors and newer (SSE4.1, SSE4.2, POPCNT and LZCNT supported)
  • AMD
    • Zen4 processors and newer (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
AVX512BW vdbpsadbw
AVX512CD vplzcntd
AVX512DQ vpmullq
AVX512VL

The x86-64 microarchitecture feature levels can also be found as AMD64-v1, AMD64-v2 .. or AMD64_v1 .. in settings where the "AMD64" nomenclature is used. These are used as synonyms with the x86-64-vX nomenclature and are thus functionally identical. Examples of this include the Go language documentation and the Fedora Linux distribution.

All levels include features found in the previous levels. Instruction set extensions not concerned with general-purpose computation, including AES-NI and RDRAND, are excluded from the level requirements.

On most recent x86_64 Linux distributions, all x86_64 feature levels supported by a CPU can be verified using command: /lib64/ld-linux-x86-64.so.2 --help (available since glibc 2.33[44]). The result will be visible at the end of command's output:

Subdirectories of glibc-hwcaps directories, in priority order:
  x86-64-v4
  x86-64-v3 (supported, searched)
  x86-64-v2 (supported, searched)

Here x86-64-v4 feature level is not supported by CPU, but x86-64-v3 and x86-64-v2 are, which means this CPU does not support AVX512 required at v4 level.

Differences between AMD64 and Intel 64

[edit]

Although nearly identical, there are some differences between the two instruction sets in the semantics of a few seldom used machine instructions (or situations), which are mainly used for system programming.[45] Unless instructed to otherwise via -march settings, compilers generally produce executables (i.e. machine code) that avoid any differences, at least for ordinary application programs. This is therefore of interest mainly to developers of compilers, operating systems and similar, which must deal with individual and special system instructions.

Recent implementations

[edit]
  • Intel 64 allows SYSCALL/SYSRET only in 64-bit mode (not in compatibility mode),[46] and allows SYSENTER/SYSEXIT in both modes.[47] AMD64 lacks SYSENTER/SYSEXIT in both sub-modes of long mode.[11]: 33 
  • When returning to a non-canonical address using SYSRET, AMD64 processors execute the general protection fault handler in privilege level 3,[48] while on Intel 64 processors it is executed in privilege level 0.[49]
  • The SYSRET instruction will load a set of fixed values into the hidden part of the SS segment register (base-address, limit, attributes) on Intel 64 but leave the hidden part of SS unchanged on AMD64.[50]
  • On Intel 64, the SYSRET instruction unconditionally sets the privilege level (RPL) of the SS segment register to 3 (as the instruction causes a return to privilege level 3).[51] On AMD64, the RPL is set to the corresponding bits in the STAR MSR (model-specific register), that is, bits 49 and 48.[52]
  • AMD64 requires a different microcode update format and control MSRs, while Intel 64 implements microcode update unchanged from their 32-bit only processors.
  • Intel 64 lacks some MSRs that are considered architectural in AMD64. These include SYSCFG, TOP_MEM, and TOP_MEM2.
  • Intel 64 lacks the ability to save and restore a reduced (and thus faster) version of the floating-point state (involving the FXSAVE and FXRSTOR instructions).[clarification needed]
  • On AMD64, the FXSAVE/FXRSTOR instructions will only save/restore x87 exception pointers (FCS/FIP, FDS/FDP, FOP) when an unmasked pending x87 exception is present.[53][54] On Intel 64, these pointers are always saved and restored regardless of x87 exception status.
    • This Intel/AMD difference also applies to all variants of the XSAVE*/XRSTOR* instructions as well, but not to the older FNSAVE/FRSTOR instructions (which always save/restore these pointers).
  • In 64-bit mode, near branches with the 66H (operand size override) prefix behave differently. Intel 64 ignores this prefix: the instruction has a 32-bit sign extended offset, and instruction pointer is not truncated. AMD64 uses a 16-bit offset field in the instruction, and clears the top 48 bits of instruction pointer.
  • On Intel 64 but not AMD64, the REX.W prefix can be used with the far-pointer instructions (LFS, LGS, LSS, JMP FAR, CALL FAR) to increase the size of their far pointer argument to 80 bits (64-bit offset + 16-bit segment).
  • When the MOVSXD instruction is executed with a memory source operand and an operand-size of 16 bits, the memory operand will be accessed with a 16-bit read on Intel 64, but a 32-bit read on AMD64.
  • When the PUSH instruction is used with a segment register and an operand-size of 32 bits in legacy/compatibility mode, AMD64 will zero-extend the register from 2 to 4 bytes and push that 4-byte value onto the stack. Intel 64 will also decrement the stack pointer by 4 but will just write 2 bytes, leaving a 2-byte hole that's not written.[55][56]
  • When the POP instruction is used with a segment register and an operand-size of 32 or 64 bits, AMD64 will perform a memory read from the stack that is as wide as the operand-size, while Intel 64 will perform a 16-bit memory read (but still increment the stack-pointer according to the operand size.)[56]
  • The FCOMI/FCOMIP/FUCOMI/FUCOMIP (x87 floating-point compare) instructions will clear the OF, SF and AF bits of EFLAGS on Intel 64, but leave these flag bits unmodified on AMD64.
  • For the VMASKMOVPS/VMASKMOVPD/VPMASKMOVD/VPMASKMOVQ (AVX/AVX2 masked move to/from memory) instructions, Intel 64 architecturally guarantees that the instructions will not cause memory faults (e.g. page-faults and segmentation-faults) for any zero-masked lanes, while AMD64 does not provide such a guarantee.
  • If the RDRAND instruction fails to obtain a random number (as indicated by EFLAGS.CF=0), the destination register is architecturally guaranteed to be set to 0 on Intel 64 but not AMD64.
  • For the VPINSRD and VPEXTRD (AVX vector lane insert/extract) instructions outside 64-bit mode, AMD64 requires the instructions to be encoded with VEX.W=0, while Intel 64 also accepts encodings with VEX.W=1. (In 64-bit mode, both AMD64 and Intel 64 require VEX.W=0.)
  • When alignment checking is enabled (EFLAGS.AC=1), AVX instructions with misaligned 128-bit or 256-bit memory operands, and the SSE4.2 PCMP*STR* instructions with misaligned 128-bit memory operands, will cause #AC (alignment check) exceptions on AMD64[57] but not Intel 64.[58]
  • The 0F 0D /r opcode with the ModR/M byte's Mod field set to 11b is a Reserved-NOP on Intel 64[59] but will cause #UD (invalid-opcode exception) on AMD64.[60]
  • The ordering guarantees provided by some memory ordering instructions such as LFENCE and MFENCE differ between Intel 64 and AMD64:
    • LFENCE is dispatch-serializing (enabling it to be used as a speculation fence) on Intel 64 but is not architecturally guaranteed to be dispatch-serializing on AMD64.[61]
    • MFENCE is a fully serializing instruction (including instruction fetch serialization) on AMD64 but not Intel 64.
    • The MOV to CR8 and INVPCID instructions are serializing on AMD64 but not Intel 64.
    • The LMSW instruction is serializing on Intel 64 but not AMD64.
    • WRMSR to the x2APIC ICR (Interrupt Command Register; MSR 830h) is commonly used to produce an IPI (Inter-processor interrupt) — on Intel 64[62] but not AMD64[63] CPUs, such an IPI can be reordered before an older memory store.
    • On recent AMD64 processors (Zen 4 and later), WRMSR to the FS_BASE, GS_BASE and KernelGSBase MSRs is non-serializing.[64] On Intel 64 processors as well as older AMD64 processors, WRMSR to these MSRs is serializing.

Older implementations

[edit]
  • The AMD64 processors prior to Revision F[65] (distinguished by the switch from DDR to DDR2 memory and new sockets AM2, F and S1) of 2006 lacked the CMPXCHG16B instruction, which is an extension of the CMPXCHG8B instruction present on most post-80486 processors. Similar to CMPXCHG8B, CMPXCHG16B allows for atomic operations on octa-words (128-bit values). This is useful for parallel algorithms that use compare and swap on data larger than the size of a pointer, common in lock-free and wait-free algorithms. Without CMPXCHG16B one must use workarounds, such as a critical section or alternative lock-free approaches.[66] Its absence also prevents 64-bit Windows prior to Windows 8.1 from having a user-mode address space larger than 8 TiB.[67] The 64-bit version of Windows 8.1 requires the instruction.[68]
  • Early AMD64 and Intel 64 CPUs lacked LAHF and SAHF instructions in 64-bit mode. AMD introduced these instructions (also in 64-bit mode) with their 90 nm (revision D) processors, starting with Athlon 64 in October 2004.[69][70] Intel introduced the instructions in October 2005 with the 0F47h and later revisions of NetBurst.[76] The 64-bit version of Windows 8.1 requires this feature.[68]
  • Early Intel CPUs with Intel 64 also lack the NX bit of the AMD64 architecture. It was added in the stepping E0 (0F41h) Pentium 4 in October 2004.[77] This feature is required by all versions of Windows 8.
  • Early Intel 64 implementations had a 36-bit (64 GiB) physical addressing of memory while original AMD64 implementations had a 40-bit (1 TiB) physical addressing. Intel used the 40-bit physical addressing first on Xeon MP (Potomac), launched on 29 March 2005.[78] The difference is not a difference of the user-visible ISAs. In 2007 AMD 10h-based Opteron was the first to provide a 48-bit (256 TiB) physical address space.[79][80] Intel 64's physical addressing was extended to 44 bits (16 TiB) in Nehalem-EX in 2010[81] and to 46 bits (64 TiB) in Sandy Bridge E in 2011.[82][83] With the Ice Lake 3rd gen Xeon Scalable processors, Intel increased the virtual addressing to 57 bits (128 PiB) and physical to 52 bits (4 PiB) in 2021, necessitating a 5-level paging.[84] The following year AMD64 added the same in 4th generation EPYC (Genoa).[85] Non-server CPUs retain smaller address spaces for longer.
  • On all AMD64 processors, the BSF and BSR instructions will, when given a source value of 0, leave their destination register unmodified. This is mostly the case on Intel 64 processors as well, except that on some older Intel 64 CPUs, executing these instructions with an operand size of 32 bits will clear the top 32 bits of their destination register even with a source value of 0 (with the low 32 bits kept unchanged.)[86]
  • AMD64 processors since Opteron Rev. E and Athlon 64 Rev. D reintroduced limited support for segmentation, via the Long Mode Segment Limit Enable (LMSLE) bit, to ease virtualization of 64-bit guests.[87][88] LMLSE support was removed in the Zen 3 processor.[89]
  • On all Intel 64 processors, CLFLUSH is ordered with respect to SFENCE - this is also the case on newer AMD64 processors (Zen 1 and later). On older AMD64 processors, imposing ordering on the CLFLUSH instruction instead required MFENCE.

Adoption

[edit]
An area chart showing the representation of different families of microprocessors in the TOP500 supercomputer ranking list, from 1993 to 2020[90]

In supercomputers tracked by TOP500, the appearance of 64-bit extensions for the x86 architecture enabled 64-bit x86 processors by AMD and Intel to replace most RISC processor architectures previously used in such systems (including PA-RISC, SPARC, Alpha and others), as well as 32-bit x86, even though Intel itself initially tried unsuccessfully to replace x86 with a new incompatible 64-bit architecture in the Itanium processor.

As of 2023, a HPE EPYC-based supercomputer called Frontier is number one. The first ARM-based supercomputer appeared on the list in 2018[91] and, in recent years, non-CPU architecture co-processors (GPGPU) have also played a big role in performance. Intel's Xeon Phi "Knights Corner" coprocessors, which implement a subset of x86-64 with some vector extensions,[92] are also used, along with x86-64 processors, in the Tianhe-2 supercomputer.[93]

Operating system compatibility and characteristics

[edit]

The following operating systems and releases support the x86-64 architecture in long mode.

BSD

[edit]

DragonFly BSD

[edit]

Preliminary infrastructure work was started in February 2004 for a x86-64 port.[94] This development later stalled. Development started again during July 2007[95] and continued during Google Summer of Code 2008 and SoC 2009.[96][97] The first official release to contain x86-64 support was version 2.4.[98]

FreeBSD

[edit]

FreeBSD first added x86-64 support under the name "amd64" as an experimental architecture in 5.1-RELEASE in June 2003. It was included as a standard distribution architecture as of 5.2-RELEASE in January 2004. Since then, FreeBSD has designated it as a Tier 1 platform. The 6.0-RELEASE version cleaned up some quirks with running x86 executables under amd64, and most drivers work just as they do on the x86 architecture. Work is currently being done to integrate more fully the x86 application binary interface (ABI), in the same manner as the Linux 32-bit ABI compatibility currently works.

NetBSD

[edit]

x86-64 architecture support was first committed to the NetBSD source tree on June 19, 2001. As of NetBSD 2.0, released on December 9, 2004, NetBSD/amd64 is a fully integrated and supported port. 32-bit code is still supported in 64-bit mode, with a netbsd-32 kernel compatibility layer for 32-bit syscalls. The NX bit is used to provide non-executable stack and heap with per-page granularity (segment granularity being used on 32-bit x86).

OpenBSD

[edit]

OpenBSD has supported AMD64 since OpenBSD 3.5, released on May 1, 2004. Complete in-tree implementation of AMD64 support was achieved prior to the hardware's initial release because AMD had loaned several machines for the project's hackathon that year. OpenBSD developers have taken to the platform because of its support for the NX bit, which allowed for an easy implementation of the W^X feature.

The code for the AMD64 port of OpenBSD also runs on Intel 64 processors which contains cloned use of the AMD64 extensions, but since Intel left out the page table NX bit in early Intel 64 processors, there is no W^X capability on those Intel CPUs; later Intel 64 processors added the NX bit under the name "XD bit". Symmetric multiprocessing (SMP) works on OpenBSD's AMD64 port, starting with release 3.6 on November 1, 2004.

DOS

[edit]

It is possible to enter long mode under DOS without a DOS extender,[99] but the user must return to real mode in order to call BIOS or DOS interrupts.

It may also be possible to enter long mode with a DOS extender similar to DOS/4GW, but more complex since x86-64 lacks virtual 8086 mode. DOS itself is not aware of that, and no benefits should be expected unless running DOS in an emulation with an adequate virtualization driver backend, for example: the mass storage interface.

Linux

[edit]

Linux was the first operating system kernel to run the x86-64 architecture in long mode, starting with the 2.4 version in 2001 (preceding the hardware's availability).[100][101] Linux also provides backward compatibility for running 32-bit executables. This permits programs to be recompiled into long mode while retaining the use of 32-bit programs. Current Linux distributions ship with x86-64-native kernels and userlands. Some, such as Arch Linux,[102] SUSE, Mandriva, and Debian, allow users to install a set of 32-bit components and libraries when installing off a 64-bit distribution medium, thus allowing most existing 32-bit applications to run alongside the 64-bit OS.

x32 ABI (Application Binary Interface), introduced in Linux 3.4, allows programs compiled for the x32 ABI to run in the 64-bit mode of x86-64 while only using 32-bit pointers and data fields.[103][104][105] Though this limits the program to a virtual address space of 4 GiB, it also decreases the memory footprint of the program and in some cases can allow it to run faster.[103][104][105]

64-bit Linux allows up to 128 TiB of virtual address space for individual processes, and can address approximately 64 TiB of physical memory, subject to processor and system limitations,[106] or up to 128 PiB (virtual) and 4 PiB (physical) with 5-level paging enabled.[107]

macOS

[edit]

Mac OS X 10.4.7 and higher versions of Mac OS X 10.4 run 64-bit command-line tools using the POSIX and math libraries on 64-bit Intel-based machines, just as all versions of Mac OS X 10.4 and 10.5 run them on 64-bit PowerPC machines. No other libraries or frameworks work with 64-bit applications in Mac OS X 10.4.[108] The kernel, and all kernel extensions, are 32-bit only.

Mac OS X 10.5 supports 64-bit GUI applications using Cocoa, Quartz, OpenGL, and X11 on 64-bit Intel-based machines, as well as on 64-bit PowerPC machines.[109] All non-GUI libraries and frameworks also support 64-bit applications on those platforms. The kernel, and all kernel extensions, are 32-bit only.

Mac OS X 10.6 is the first version of macOS that supports a 64-bit kernel. However, not all 64-bit computers can run the 64-bit kernel, and not all 64-bit computers that can run the 64-bit kernel will do so by default.[110] The 64-bit kernel, like the 32-bit kernel, supports 32-bit applications; both kernels also support 64-bit applications. 32-bit applications have a virtual address space limit of 4 GiB under either kernel.[111][112] The 64-bit kernel does not support 32-bit kernel extensions, and the 32-bit kernel does not support 64-bit kernel extensions.

OS X 10.8 includes only the 64-bit kernel, but continues to support 32-bit applications; it does not support 32-bit kernel extensions, however.

macOS 10.15 includes only the 64-bit kernel and no longer supports 32-bit applications. This removal of support has presented a problem for Wine (and the commercial version CrossOver), as it needs to still be able to run 32-bit Windows applications. The solution, termed wine32on64, was to add thunks that bring the CPU in and out of 32-bit compatibility mode in the nominally 64-bit application.[113][114]

macOS uses the universal binary format to package 32- and 64-bit versions of application and library code into a single file; the most appropriate version is automatically selected at load time. In Mac OS X 10.6, the universal binary format is also used for the kernel and for those kernel extensions that support both 32-bit and 64-bit kernels.

Solaris

[edit]

Solaris 10 and later releases support the x86-64 architecture.

For Solaris 10, just as with the SPARC architecture, there is only one operating system image, which contains a 32-bit kernel and a 64-bit kernel; this is labeled as the "x64/x86" DVD-ROM image. The default behavior is to boot a 64-bit kernel, allowing both 64-bit and existing or new 32-bit executables to be run. A 32-bit kernel can also be manually selected, in which case only 32-bit executables will run. The isainfo command can be used to determine if a system is running a 64-bit kernel.

For Solaris 11, only the 64-bit kernel is provided. However, the 64-bit kernel supports both 32- and 64-bit executables, libraries, and system calls.

Windows

[edit]

x64 editions of Microsoft Windows client and server—Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition—were released in March 2005.[115] Internally they are actually the same build (5.2.3790.1830 SP1),[116][117] as they share the same source base and operating system binaries, so even system updates are released in unified packages, much in the manner as Windows 2000 Professional and Server editions for x86. Windows Vista, which also has many different editions, was released in January 2007. Windows 7 was released in July 2009. Windows Server 2008 R2 was sold in only x64 and Itanium editions; later versions of Windows Server only offer an x64 edition.

Versions of Windows for x64 prior to Windows 8.1 and Windows Server 2012 R2 offer the following:

  • 8 TiB of virtual address space per process, accessible from both user mode and kernel mode, referred to as the user mode address space. An x64 program can use all of this, subject to backing store limits on the system, and provided it is linked with the "large address aware" option, which is present by default.[118] This is a 4096-fold increase over the default 2 GiB user-mode virtual address space offered by 32-bit Windows.[119][120]
  • 8 TiB of kernel mode virtual address space for the operating system.[119] As with the user mode address space, this is a 4096-fold increase over 32-bit Windows versions. The increased space primarily benefits the file system cache and kernel mode "heaps" (non-paged pool and paged pool). Windows only uses a total of 16 TiB out of the 256 TiB implemented by the processors because early AMD64 processors lacked a CMPXCHG16B instruction.[121]

Under Windows 8.1 and Windows Server 2012 R2, both user mode and kernel mode virtual address spaces have been extended to 128 TiB.[23] These versions of Windows will not install on processors that lack the CMPXCHG16B instruction.

The following additional characteristics apply to all x64 versions of Windows:

  • Ability to run existing 32-bit applications (.exe programs) and dynamic link libraries (.dlls) using WoW64 if WoW64 is supported on that version. Furthermore, a 32-bit program, if it was linked with the "large address aware" option,[118] can use up to 4 GiB of virtual address space in 64-bit Windows, instead of the default 2 GiB (optional 3 GiB with /3GB boot option and "large address aware" link option) offered by 32-bit Windows.[122] Unlike the use of the /3GB boot option on x86, this does not reduce the kernel mode virtual address space available to the operating system. 32-bit applications can, therefore, benefit from running on x64 Windows even if they are not recompiled for x86-64.
  • Both 32- and 64-bit applications, if not linked with "large address aware", are limited to 2 GiB of virtual address space.
  • Ability to use up to 128 GiB (Windows XP/Vista), 192 GiB (Windows 7), 512 GiB (Windows 8), 1 TiB (Windows Server 2003), 2 TiB (Windows Server 2008/Windows 10), 4 TiB (Windows Server 2012), or 24 TiB (Windows Server 2016/2019) of physical random access memory (RAM).[123]
  • LLP64 data model: in C/C++, "int" and "long" types are 32 bits wide, "long long" is 64 bits, while pointers and types derived from pointers are 64 bits wide.
  • Kernel mode device drivers must be 64-bit versions; there is no way to run 32-bit kernel mode executables within the 64-bit operating system. User mode device drivers can be either 32-bit or 64-bit.
  • 16-bit Windows (Win16) and DOS applications will not run on x86-64 versions of Windows due to the removal of the virtual DOS machine subsystem (NTVDM) which relied upon the ability to use virtual 8086 mode. Virtual 8086 mode cannot be entered while running in long mode.
  • Full implementation of the NX (No Execute) page protection feature. This is also implemented on recent 32-bit versions of Windows when they are started in PAE mode.
  • Instead of FS segment descriptor on x86 versions of the Windows NT family, GS segment descriptor is used to point to two operating system defined structures: Thread Information Block (NT_TIB) in user mode and Processor Control Region (KPCR) in kernel mode. Thus, for example, in user mode GS:0 is the address of the first member of the Thread Information Block. Maintaining this convention made the x86-64 port easier, but required AMD to retain the function of the FS and GS segments in long mode – even though segmented addressing per se is not really used by any modern operating system.[119]
  • Early reports claimed that the operating system scheduler would not save and restore the x87 FPU machine state across thread context switches. Observed behavior shows that this is not the case: the x87 state is saved and restored, except for kernel mode-only threads (a limitation that exists in the 32-bit version as well). The most recent documentation available from Microsoft states that the x87/MMX/3DNow! instructions may be used in long mode, but that they are deprecated and may cause compatibility problems in the future.[122] (3DNow! is no longer available on AMD processors, with the exception of the PREFETCH and PREFETCHW instructions,[124] which are also supported on Intel processors as of Broadwell.)
  • Some components like Jet Database Engine and Data Access Objects will not be ported to 64-bit architectures such as x86-64 and IA-64.[125][126][127]
  • Microsoft Visual Studio can compile native applications to target either the x86-64 architecture, which can run only on 64-bit Microsoft Windows, or the IA-32 architecture, which can run as a 32-bit application on 32-bit Microsoft Windows or 64-bit Microsoft Windows in WoW64 emulation mode. Managed applications can be compiled either in IA-32, x86-64 or AnyCPU modes. Software created in the first two modes behave like their IA-32 or x86-64 native code counterparts respectively; When using the AnyCPU mode, however, applications in 32-bit versions of Microsoft Windows run as 32-bit applications, while they run as a 64-bit application in 64-bit editions of Microsoft Windows.

Video game consoles

[edit]

The PlayStation 4 and Xbox One use AMD x86-64 processors based on the Jaguar microarchitecture.[128][129] Firmware and games are written in x86-64 code; no legacy x86 code is involved. The PlayStation 5 and Xbox Series X/S use AMD x86-64 processors based on the Zen 2 microarchitecture.[130][131] The Steam Deck uses a custom AMD x86-64 accelerated processing unit (APU) based on the Zen 2 microarchitecture.[132]

Industry naming conventions

[edit]

Since AMD64 and Intel 64 are substantially similar, many software and hardware products use one vendor-neutral term to indicate their compatibility with both implementations. AMD's original designation for this processor architecture, "x86-64", is still used for this purpose,[2] as is the variant "x86_64".[3][4] Other companies, such as Microsoft[6] and Sun Microsystems/Oracle Corporation,[5] use the contraction "x64" in marketing material.

The term IA-64 refers to the Itanium processor, and should not be confused with x86-64, as it is a completely different instruction set.

Many operating systems and products, especially those that introduced x86-64 support prior to Intel's entry into the market, use the term "AMD64" or "amd64" to refer to both AMD64 and Intel 64.

  • amd64
    • Most BSD systems such as FreeBSD, MidnightBSD, NetBSD and OpenBSD refer to both AMD64 and Intel 64 under the architecture name "amd64".
    • Some Linux distributions such as Debian, Ubuntu, Gentoo Linux refer to both AMD64 and Intel 64 under the architecture name "amd64".
    • Microsoft Windows's x64 versions use the AMD64 moniker internally to designate various components which use or are compatible with this architecture. For example, the environment variable PROCESSOR_ARCHITECTURE is assigned the value "AMD64" as opposed to "x86" in 32-bit versions, and the system directory on a Windows x64 Edition installation CD-ROM is named "AMD64", in contrast to "i386" in 32-bit versions.[133]
    • Sun's Solaris's isalist command identifies both AMD64- and Intel 64-based systems as "amd64".
    • Java Development Kit (JDK): the name "amd64" is used in directory names containing x86-64 files.
  • x86_64

Licensing

[edit]

x86-64/AMD64 was solely developed by AMD. Until April 2021 when the relevant patents expired, AMD held patents on techniques used in AMD64;[135][136][137] those patents had to be licensed from AMD in order to implement AMD64. Intel entered into a cross-licensing agreement with AMD, licensing to AMD their patents on existing x86 techniques, and licensing from AMD their patents on techniques used in x86-64.[138] In 2009, AMD and Intel settled several lawsuits and cross-licensing disagreements, extending their cross-licensing agreements.[139][140][141]

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
x86-64, also known as AMD64 and Intel 64, is a 64-bit extension of the x86 (ISA) that maintains with 32-bit x86 software while enabling larger memory addressing and enhanced performance for 64-bit applications. Developed by in the late as a response to the limitations of , the was first publicly detailed in 1999 and commercially introduced in April 2003 with the processor family. Intel initially pursued a separate 64-bit path with () but later adopted a compatible version called Extended Memory 64 Technology (EM64T), rebranded as Intel 64, which debuted in June 2004 with the Nocona-based processors. The x86-64 ISA has since become the dominant standard, powering most personal computers, servers, and workstations from , , and other vendors like . Key features of x86-64 include an expanded register set with sixteen 64-bit general-purpose registers (extending the original eight from x86 and adding eight new ones, R8–R15), support for up to 2^64 bytes of (though implementations typically limit physical addressing to 2^48 bytes or more via extensions), and new instructions for improved efficiency in memory access and computation. The architecture operates in multiple modes: for native 64-bit execution, for running unmodified 32-bit x86 applications, and a legacy 16-bit for older software, ensuring seamless transition without emulation overhead. Additional enhancements include RIP-relative addressing for , larger page sizes (up to 1 GB), and support for advanced vector instructions like AVX, which have evolved through extensions such as SSE, , and beyond. This design has facilitated the widespread adoption of 64-bit operating systems like Windows, , and macOS, significantly boosting computational capabilities in fields ranging from general computing to high-performance scientific simulations.

History and Development

Origins and AMD's Role

In the late 1990s, AMD recognized key limitations in the prevailing 32-bit x86 architecture, including a maximum addressable memory of 4 GB and only eight general-purpose registers, which constrained performance in demanding server and workstation applications. To address these issues, AMD developed a backward-compatible 64-bit extension to x86, initially codenamed "Hammer" and later branded as AMD64, enabling vastly expanded memory addressing up to 2^64 bytes and doubling the register count to 16 for improved efficiency. This initiative marked AMD's strategic push to compete in the 64-bit computing market, where Intel was promoting its incompatible IA-64 architecture. AMD first publicly announced the AMD64 architecture in October 1999 at the Microprocessor Forum, positioning it as a practical evolution of x86 rather than a complete overhaul. The company followed this with the release of a detailed architectural specification in August 2000, outlining extensions such as new 64-bit registers (R8-R15), enhanced addressing modes, and support for larger data types while maintaining full compatibility with existing 32-bit software. This specification served as the foundation for subsequent implementations and was made available to developers to facilitate early software preparation. The first hardware realization of AMD64 came with the launch of the processor family on April 22, 2003, targeting server and workstation markets with its integrated and support for up to 1 TB of physical memory per system. Building on this momentum, forged key partnerships to ensure ecosystem support; notably, in close collaboration with , the company enabled the release of on April 25, 2005, which provided native 64-bit application execution on AMD64 hardware. later adopted a compatible version of the in its processors starting in 2004.

Intel's Adoption and Evolution

Intel initially resisted extending the x86 architecture to 64 bits, favoring its proprietary () as the future of to address the complexities of with legacy software. However, the commercial success of AMD's AMD64 architecture prompted Intel to adopt a compatible extension, implementing it as Extended Memory 64 Technology (EM64T) in its processors. This adoption was driven by market pressures to compete in server segments where larger memory addressing was increasingly demanded. The first Intel processors supporting EM64T were the Nocona-based Xeon 3 series, released on June 28, 2004, which integrated the 64-bit extensions into the microarchitecture derived from the Prescott core. These chips enabled 64-bit operation while maintaining full compatibility with 32-bit x86 software, building on AMD's prior specification as the basis for the architecture. In late 2006, Intel rebranded EM64T as Intel 64 to better align with its marketing strategy and emphasize broad platform support across consumer and enterprise products. Subsequent evolutions integrated 64 into the Core microarchitecture, debuting in with processors like the Core 2 Duo (Conroe), which shifted from NetBurst's high-clock, long-pipeline design to a more efficient, shorter-pipeline approach. This transition supported larger on-chip caches—up to 6 MB in early Core 2 models—and facilitated multi-core configurations, enhancing performance in multi-threaded workloads while preserving 64-bit capabilities. also contributed to the standardization of floating-point support in 64-bit mode by fully integrating the FPU into the architecture, allowing legacy x87 instructions to operate alongside mandatory for modern scalar and vector floating-point operations. This ensured seamless handling of extended-precision (80-bit) formats in without requiring separate coprocessors, as detailed in Intel's architectural specifications.

Key Milestones and Standards

The AMD64 architecture, also known as x86-64, was formalized by through the publication of the initial AMD64 Architecture Programmer's Manual in April 2003, coinciding with the launch of the first compliant server processor, the , on April 22, 2003. This marked the official ratification of the 64-bit extension to the x86 instruction set, designed for with 32-bit software while expanding addressing capabilities to 64 bits. Shortly thereafter, released the consumer-oriented processor on September 23, 2003, bringing x86-64 to desktop computing. AMD licensed the AMD64 technology to Intel under their existing cross-licensing agreement, enabling Intel to implement it as EM64T (later rebranded Intel 64). Intel's first x86-64 processor, the Nocona, launched on June 28, 2004, followed by integration into the desktop line in February 2005. To promote , the System V (ABI) for AMD64 Architecture was standardized in December 2003, defining conventions for software portability across implementations. AMD and Intel further aligned their approaches in 2004, achieving near-complete compatibility between AMD64 and EM64T, with only minor differences resolvable through future revisions or software. Post-2010 developments focused on extensions to enhance performance and security. In 2011, Intel introduced (AVX), a 256-bit SIMD instruction set extension that later adopted, standardizing wider vector operations for compute-intensive workloads. In the , security features received significant updates, including enhancements to Secure Memory Encryption (SME)—initially proposed by in 2016—with the addition of Secure Encrypted Virtualization-Encrypted State (SEV-ES) in 2019 and Secure Nested Paging (SEV-SNP) in 2021, providing hardware-based memory isolation for virtual machines against hypervisor attacks. In October 2024, and established the x86 Ecosystem Advisory Group to guide the future development of the x86 architecture, with a focus on improving compatibility across platforms and simplifying . In December 2024, terminated its x86S project, an experimental effort to create a simplified 64-bit-only variant of the ISA, in favor of collaborative enhancements to the standard x86-64. As of 2025, the advisory group has detailed new x86 instructions aimed at bolstering features, such as memory labeling to detect common errors like buffer overflows, and performance optimizations to maintain the relevance of the instruction set.

Architectural Foundations

Core Extensions from x86

The x86-64 architecture, also known as AMD64, fundamentally extends the 32-bit x86 instruction set by widening the general-purpose registers to 64 bits and introducing additional registers to support larger address spaces and improved performance. The original eight general-purpose registers—EAX, EBX, ECX, , ESI, EDI, EBP, and ESP—are extended to 64 bits, renamed as , RBX, RCX, , RSI, RDI, RBP, and RSP, respectively, allowing direct manipulation of 64-bit integers and addresses without partial register operations that could cause inefficiencies in 32-bit mode. Furthermore, eight new 64-bit registers, R8 through R15, are added, doubling the total to 16 general-purpose registers and providing more flexibility for function calls, loops, and data processing in 64-bit applications. These extensions enable a 64-bit while maintaining compatibility with legacy code. A key innovation in x86-64 is the introduction of RIP-relative addressing, which facilitates (PIC) essential for shared libraries and . In 64-bit mode, instructions can reference memory locations relative to the current value of the 64-bit instruction pointer (RIP), using a signed 32-bit displacement to compute the effective . This mode is the default for memory operands in 64-bit mode, reducing the need for runtime relocations and improving code density compared to absolute addressing in 32-bit x86. For example, a load instruction like mov rax, [rip + offset] allows direct access to relative to the instruction's position, enhancing portability across different load addresses. x86-64 mandates with 32-bit x86 code, ensuring that existing software can execute without recompilation through a dedicated compatibility sub-mode within . In this mode, the processor executes natively in the 32-bit environment, using 32-bit registers, addressing, and segment semantics, while allowing seamless transitions to 64-bit code via system calls or far jumps. This design choice preserves the vast x86 software ecosystem, with operating systems like Windows and supporting mixed 32-bit and 64-bit execution environments. In 64-bit mode, x86-64 simplifies the model by effectively removing segment limits, promoting a flat addressing scheme that eliminates the complexities of segmented from 32-bit x86. Segment registers such as CS, DS, ES, and SS are treated with a base of 0, and their limit and attribute fields are ignored during translation, allowing the full 64-bit linear to be used without bounds checking on segments. FS and GS retain limited utility for base overrides, often used for , but overall, this change streamlines programming by assuming a single, contiguous up to 2^48 bytes virtually.

Instruction Set and Registers

The x86-64 architecture expands the general-purpose register (GPR) set to sixteen 64-bit registers, designated RAX through RDX, RDI, RSI, RBP, RSP, and R8 through R15, enabling more efficient computation without frequent memory accesses compared to the eight 32-bit registers of the original x86. Each register supports full 64-bit operations in 64-bit mode, with the lower 32 bits accessible via EAX through EDX, EDI, ESI, EBP, ESP, and R8D through R15D for compatibility with 32-bit . The lower 16 bits and 8 bits are similarly aliased, such as AX through DX, DI, SI, BP, SP, and R8W through R15W for 16-bit access, and AL, AH through DL, DH, DIL, SIL, BPL, SPL, and R8B through R15B for 8-bit access, preserving backward compatibility while allowing seamless mixing of sizes. Operations on 32-bit subregisters zero-extend the result into the upper 32 bits of the 64-bit register to avoid unintended . To support 64-bit integer handling, x86-64 introduces instructions like MOVSXD, which sign-extends a 32-bit source (from a register or ) into a 64-bit destination register, facilitating efficient promotion of legacy 32-bit values to 64-bit without additional masking. Conditional move instructions, such as CMOVcc (where cc denotes a condition code like E for equal or NE for not equal), copy a source to the destination only if the specified flags in the EFLAGS register are set, reducing branch overhead in and improving performance in predicated execution scenarios. These instructions operate on both 32-bit and 64-bit registers, with the REX prefix enabling 64-bit variants. Stack management in x86-64 uses the RSP register as the stack pointer, which points to the top of the stack and defaults to 64-bit addressing in . PUSH and POP instructions, along with CALL and RET, implicitly adjust RSP by 8 bytes per operation in 64-bit mode, as all stack pushes and pops are 64-bit aligned to match the register width. The System V ABI for x86-64 mandates that the stack must be aligned to a 16-byte boundary immediately before any CALL instruction and upon function entry, ensuring optimal access for vector operations and avoiding alignment faults; misaligned stacks can trigger exceptions like #SS (stack segment) or #AC (alignment check) if enabled. This alignment is maintained by adjusting RSP with operations like AND RSP, 0xFFFFFFFFFFFFFFF0 before calls. Floating-point and vector processing integrate the legacy x87 FPU registers with the SIMD extensions, where x86-64 provides sixteen 128-bit XMM registers (XMM0 through XMM15) inherited from SSE, doubling the count from 32-bit mode to support parallel operations on multiple data elements like four single-precision floats or two double-precision floats per register. AVX extends these to 256-bit YMM registers by accessing the upper 128 bits alongside the lower, allowing wider vector computations without dedicated hardware, while further extensions enable up to 512-bit ZMM registers for enhanced parallelism in compatible implementations. These registers handle both scalar floating-point and packed vector data, with instructions like MOVAPS for aligned moves and arithmetic operations unified across SSE and AVX for seamless integration in 64-bit code.

Data Types and Operations

The x86-64 architecture introduces native support for 64-bit data types, extending the capabilities of the original x86 instruction set to handle larger integers, floating-point values, and addresses efficiently in 64-bit mode. This includes quadword (64-bit) integers stored in general-purpose registers such as RAX through R15, which range from -2^63 to 2^63-1 for signed values and 0 to 2^64-1 for unsigned. Double-precision floating-point numbers, adhering to the standard with a 53-bit and 11-bit exponent, are natively supported via the and instructions, offering a range approximately from 2.23 × 10^{-308} to 1.79 × 10^{308}. Pointers in x86-64 are 64-bit virtual addresses, enabling a vast of up to 2^64 bytes in theory, though implementations typically use 48 bits in canonical form for sign-extension to prevent addressing ambiguities. For wider data handling, x86-64 supports 128-bit packed integers through (SSE), allowing two 64-bit integers to be processed simultaneously in 128-bit XMM registers. This enables efficient vectorized operations on packed quadwords, such as addition or multiplication, without requiring explicit extension instructions. These data types form the foundation for 64-bit application development, where pointers and integers align naturally with modern operating system abstractions like large spaces. Arithmetic operations in x86-64 extend to full 64-bit precision, with (MUL and IMUL) producing a 128-bit result stored across RDX:RAX for handling large products, while division (DIV and IDIV) treats the as a 128-bit value in RDX:RAX, yielding a 64-bit in RAX and remainder in RDX. Overflow is managed through the EFLAGS register, where the (CF) signals unsigned overflow or carry-out, and the (OF) indicates signed overflow based on the 63rd bit mismatch. These mechanisms allow precise error detection in computational pipelines, essential for robust software handling of large numerical ranges. Bit manipulation instructions enhance data processing versatility, including BSWAP, which reverses the byte order in a 64-bit register to facilitate conversions between little-endian x86-64 and other formats. LZCNT counts the number of leading zeros in a 64-bit operand, aiding in tasks like normalization or bit position encoding, and is available via the BMI1 extension, detectable through . Such instructions optimize low-level operations in , compression, and network protocols. While 64-bit operations provide improved addressing for terabyte-scale memory and reduced segmentation overhead, they incur performance trade-offs, including larger instruction encoding sizes due to the REX prefix required for 64-bit operands, which can increase code density by up to 20-30% compared to 32-bit equivalents. Misaligned 64-bit accesses may also double bus cycle latency on some implementations, though aligned operations leverage wider data paths for higher throughput. Overall, these enhancements prioritize for data-intensive applications over the compact code of 32-bit modes.

Operating Modes and Compatibility

Long Mode

Long Mode, also known as IA-32e mode, represents the primary 64-bit execution environment in the x86-64 architecture, enabling extended addressing capabilities while providing mechanisms for . It is activated from by first enabling (PAE) via CR4.PAE bit 5, then setting the Long Mode Enable (LME) bit (bit 8) in the Extended Feature Enable Register (EFER MSR at address C000_0080h) to 1. Following this, paging is enabled by setting CR0.PG bit 31 to 1, which in turn sets the Long Mode Active (LMA) bit (bit 10) in EFER to 1, confirming the transition. Additionally, the register (GDTR) and register (IDTR) must be loaded with 64-bit base addresses to support the new mode's descriptor formats. Once activated, operates in one of two submodes determined by the code segment descriptor. The 64-bit submode, selected when the L (long) bit in the (CS) descriptor is set to 1, provides native 64-bit execution with a flat memory model that largely eliminates legacy segmentation; segment registers like CS, DS, ES, FS, and GS are ignored for base and limit calculations, though they retain utility for privilege levels and other attributes. In contrast, the compatibility submode, used for executing 32-bit applications under a 64-bit operating system and selected when CS.L is 0, preserves 32-bit behaviors including segmentation to ensure legacy software compatibility without modification. In 64-bit submode, default operand and address sizes are expanded to 64 bits for general-purpose operations and the , enabling RIP-relative addressing for , while immediate values and most offsets remain limited to 32 bits to maintain instruction encoding . The CS.L bit thus serves as the key selector for size, enforcing these defaults based on the submode. Interrupt handling in 64-bit submode requires prior setup of the (IDT) via IDTR; without it, interrupts cannot be properly vectored, and legacy interrupt mechanisms like the 8259 PIC are incompatible, relying instead on the (APIC) for delivery. This design contrasts with legacy 32-bit modes, where segmentation and interrupt handling follow traditional x86 conventions.

Legacy and Compatibility Modes

In the x86-64 , compatibility mode serves as a submode of , enabling the execution of legacy 32-bit and 16-bit applications alongside native 64-bit programs on a 64-bit operating system without requiring recompilation. This mode is activated when the processor operates in (with EFER.LMA=1 and CR0.PG=1) and the descriptor's L bit (CS.L) is cleared to 0, restricting the environment to legacy protected-mode semantics while utilizing long-mode paging and system structures for address translation and privilege management. The operand and address sizes in compatibility mode are determined by the D bit in the code segment descriptor (CS.D), which is set to 1 for 32-bit IA-32 execution (defaulting to 32-bit operands and addresses) or 0 for 16-bit operation, mirroring the behavior of traditional x86 protected mode. Protected-mode features, such as segmentation, privilege-level checks, and paging (via PAE for physical addresses beyond 4 GB), are fully supported, allowing legacy applications to access memory models including flat, segmented, and paged layouts. Real mode and virtual-8086 mode are not directly supported within long mode; instead, operating systems typically handle real-mode code through emulation or system calls during the boot process, where protected-mode structures like the (GDT) and (IDT) are initialized in real mode before transitioning to long mode via task switches or far jumps. Legacy interrupts in compatibility mode are managed through the long-mode IDT, which uses 16-byte gate descriptors for interrupt and exception handlers, with stack switches occurring on privilege-level changes and alignment maintained on 16-byte boundaries. The RFLAGS.IF bit (and VIF for virtual interrupts) controls interrupt enabling, similar to legacy modes, ensuring compatibility for hardware and software interrupts from 32-bit code. I/O port access is handled via legacy instructions like IN, OUT, INS, and OUTS, protected by the I/O Privilege Level (IOPL) in the flags register or the Task State Segment (TSS) I/O-permission bitmap, which can span up to 65,536 ports and is loaded into a 64-bit TSS for use in long-mode contexts. A key limitation of compatibility mode is the prohibition of 64-bit instructions and addressing, confining execution to the lower 4 GB of the virtual address space and restricting general-purpose registers to their 32-bit forms (with upper 32 bits ignored). Certain legacy features, such as hardware task switching and the BOUND instruction, are either disabled or invalid, and decimal arithmetic instructions generate exceptions, further emphasizing the mode's focus on IA-32e compatibility rather than full legacy replication.

Memory Management

Virtual Address Space

In the x86-64 architecture, virtual addresses are 64 bits wide, but the effective addressable space is restricted to 48 bits through the addressing mechanism, allowing up to 256 terabytes of . Canonical addresses require that bits 63 through 48 mirror the value of bit 47 via ; any non-canonical address triggers a general-protection exception (#GP). This design simplifies compatibility with 32-bit addressing while providing a vast for modern applications. The 48-bit is typically divided into user and kernel regions using 4-level paging, with each region spanning 128 terabytes. User-space addresses range from 0x0000000000000000 to 0x00007FFFFFFFFFFF, while kernel-space addresses occupy 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF, ensuring isolation between application and operating code. This split leverages the sign-extended upper bits to separate the spaces without additional hardware overhead. x86-64 supports (ASLR) by enabling operating systems to randomize base addresses within the expansive virtual space, complicating exploitation of memory vulnerabilities. In , for instance, kernel configurations like CONFIG_RANDOMIZE_BASE randomize the positions of the direct mapping, vmalloc area, and module space at , drawing on the architecture's 48-bit (or larger) range for . This feature enhances security without altering core addressing rules. An extension to 57-bit virtual addressing was introduced in 2017 through 5-level paging, expanding the total space to 128 petabytes while maintaining —now with bits 63 through 57 mirroring bit 56. This allows for 64 petabytes each in user and kernel spaces, addressing demands for massive datasets in servers and . Adoption requires CPU support, such as Intel's Ice Lake processors (2019) onward and AMD's processors (2022) onward.

Physical Address Space

The x86-64 architecture specifies a base physical address space of 40 bits, supporting up to 1 terabyte (2^40 bytes) of in its original implementation. This limit was established in the first AMD64 processors, such as the series, to balance compatibility with existing x86 systems and the need for expanded addressing beyond 32-bit constraints. Early 64 implementations initially supported only 36 bits of physical addressing, limiting the addressable space to 64 gigabytes (2^36 bytes), before aligning with the 40-bit baseline in subsequent generations. The maximum physical address width is implementation-dependent and reported via the instruction using function 80000008h, where bits 7:0 of the EAX register provide the number of supported physical address bits. Modern x86-64 processors extend this capability to 48 bits (256 terabytes, or 2^48 bytes) or 52 bits (4 petabytes, or 2^52 bytes), as permitted by the architecture's paging structures. For instance, AMD EPYC processors in server environments support 52-bit physical addressing, which facilitates large-scale (NUMA) configurations with memory controllers handling multi-terabyte capacities across multiple nodes. These extensions enhance scalability for and applications requiring vast physical memory footprints.

Page Tables and Translation

In x86-64 , address is performed using a four-level hierarchical structure in , consisting of the Page Map Level-4 (PML4) table, Page Directory Pointer Table (PDPT), Page Directory (PD), and Page Table (PT). Each level contains 512 entries, addressed by 9 bits of the virtual address, with the PML4 table's base physical stored in the CR3 . This structure enables efficient mapping of a 48-bit virtual to physical while supporting access permissions and caching attributes. The virtual address translation process begins with the 48-bit virtual address, where bits 47:39 index into the PML4 table to select an entry pointing to the PDPT. Bits 38:30 then index the PDPT to locate the PD, followed by bits 29:21 indexing the PD to reach the PT, and finally bits 20:12 indexing the PT to obtain the physical page base address. The remaining bits 11:0 serve as the offset within the page to form the final . Each page table entry includes a present bit (bit 0) that indicates whether the mapping is valid (1 for present, 0 triggers a ), along with permission bits such as read/write (bit 1), user/supervisor (bit 2), and execute disable (bit 63, if enabled via EFER.NXE). Standard page size is 4 KiB, corresponding to the PT level, but larger pages are supported for improved TLB efficiency: 2 MiB pages via the PD level when its page-size bit (PS, bit 7) is set, and 1 GiB pages via the PDPT level with PS set. These huge pages reduce the depth of the translation hierarchy, minimizing TLB misses in workloads with large memory footprints. An extension introduced in 2017 adds five-level paging to Intel implementations, inserting a PML5 table above the PML4 to support 57-bit virtual addresses and alleviate TLB pressure in high-memory scenarios. In this mode, enabled by setting CR4.LA57, bits 56:48 index the PML5 (with CR3 now pointing to it), extending the addressable virtual space while maintaining compatibility with the four-level structure. AMD introduced support for five-level paging in 2022 with the microarchitecture.

Implementations by Vendor

AMD64

AMD64, the 64-bit extension to the x86 architecture developed by , was first implemented in the and processors in 2003 using the K8 microarchitecture. This initial design extended the physical address space to 40 bits, supporting up to 1 terabyte of physical memory while maintaining compatibility with 32-bit x86 software through legacy modes. The K8's integrated and interconnect further optimized performance for both desktop and server workloads, marking a pivotal shift toward in consumer and enterprise systems. The architecture continued to evolve with the introduction of the Zen microarchitecture family in 2017, starting with Zen 1 in and processors, which initially supported 48-bit physical addressing for up to 256 terabytes of . Subsequent generations expanded this capability: (2020) introduced Secure Encrypted with Secure Nested Paging (SEV-SNP) in the 3rd Gen processors, launched in March 2021, adding integrity protection to defend against attacks like replay and corruption in virtualized environments. (2022) further advanced the design by increasing physical addressing to 52 bits—enabling up to 4 petabytes of addressable —and incorporating 5-level paging to extend virtual addressing to 57 bits, facilitating larger-scale applications in data centers. AMD-specific model-specific registers (MSRs) provide fine-grained control over hardware features unique to AMD implementations. For instance, the SYSCFG MSR (address 0xC0010010) manages system configuration and mode controls, including enabling Secure Memory Encryption (SME) and SEV features for encrypted memory operations. Other MSRs, such as MSRC001_001F (Northbridge Configuration 1), handle topology-related settings and hardware feature toggles, allowing software to query and configure processor interconnects and cache hierarchies. In server-oriented EPYC processors, AMD64 implementations emphasize scalability, supporting up to 128 PCIe lanes per socket across generations—from PCIe 3.0 in the first to PCIe 5.0 in the 5th Gen (2024)—to accommodate high-density I/O for storage, networking, and in enterprise environments. This focus on robust interconnects and has positioned AMD64 as a cornerstone for and HPC deployments.

Intel 64

Intel 64, Intel's branding for its implementation of the x86-64 , was initially introduced as Extended Memory 64 Technology (EM64T) with the Nocona-based processors in June 2004, enabling on Intel platforms while maintaining with 32-bit x86 software. This technology extended the architecture to support a of up to 2^48 bytes (256 TiB), with physical addressing initially limited to 36 bits (64 GB) in early implementations. Over time, Intel 64 evolved through successive microarchitectures, from the Nehalem family in 2008—which increased physical addressing to 40 bits (1 TiB) generally and to 44 bits (16 TiB) in server variants like Nehalem-EX—to the Core i-series processors starting with the first-generation Core i7 in 2008, which integrated 64-bit support into consumer and client-oriented designs. Later generations, such as in 2019, extended physical addressing to 46 bits (64 TiB), enhancing scalability for memory-intensive workloads in data centers and desktops. More recently, as of 2024, the 6 family further extended physical addressing to 52 bits (4 PB). Key advancements in Intel 64 appeared with the Ice Lake microarchitecture in 2019, which introduced 5-level paging to expand the to 57 bits (128 PiB), addressing limitations of the prior 4-level paging scheme that capped linear addresses at 48 bits. This feature, first deployed in 10th-generation Core processors for mobile platforms, significantly boosted support for large-scale applications by reducing address translation overhead through an additional page directory level. Concurrently, Ice Lake brought full (SGX) support to server processors like the 3rd-generation Scalable family, enabling enclave-based trusted execution environments for with up to 1 TiB of protected memory, building on earlier client-only SGX implementations. Intel 64 includes several vendor-specific features tailored to virtualization, security, and concurrency. Intel Virtualization Technology (VT-x), introduced in 2005 with select Pentium 4 processors, provides hardware-assisted virtualization through virtual machine extensions (VMX), including VM entry/exit controls and extended page tables (EPT) for efficient address translation in guest environments. For transactional memory, Intel Transactional Synchronization Extensions (TSX), debuted in 2013 with the Haswell microarchitecture, allowed speculative execution of critical sections using hardware lock elision (HLE) and restricted transactional memory (RTM) instructions to simplify parallel programming and reduce lock contention. However, due to microarchitectural data sampling vulnerabilities and reliability issues, TSX was deprecated in subsequent generations, with support disabled via microcode updates starting in 2019 on affected 8th- and 10th-generation Core processors. Software detection of 64 support relies on the instruction, specifically leaf 1 (EAX=1), where bit 29 of the ECX register (LM flag) indicates long-mode capability, confirming the processor's ability to execute 64-bit code. Additional leaves, such as extended function 80000001H:EDX bit 29 (also LM), provide further verification, ensuring compatibility checks for operating systems and applications targeting Intel 64 environments.

Other Implementations

The , released in 2008, marked the first 64-bit x86 processor from a vendor outside of and , implementing the x86-64 architecture through ' Isaiah core. This low-power design, fabricated on a by , featured an out-of-order execution pipeline with support for 64 extensions, enabling full compatibility with 64-bit operating systems and applications. VIA's ability to produce x86-64 stemmed from its x86 acquired through earlier purchases of and , supplemented by cross-licensing agreements with for 64-bit extensions. Zhaoxin, a Chinese firm established as a between and the Municipal Government, began producing x86-64 processors in 2017 with its initial ZhangJiang cores, derived from VIA's . These early implementations adhered to the AMD64 , incorporating features like AVX instructions and support while introducing custom microarchitectures tailored for domestic computing needs. Subsequent generations, such as the WuDaoKou and architectures in the KX-5000 and KX-6000 series, evolved into independent superscalar out-of-order designs, maintaining x86-64 compatibility through VIA's licensing inheritance. The licensing model for x86-64 originated with 's public specification of the in 1999, allowing third-party implementations under agreements that extend from broader x86 cross-licenses between AMD and . This framework enabled vendors like VIA to fabricate compatible processors at third-party foundries, fostering niche markets beyond the dominant AMD and Intel ecosystems. In 2023, Intel announced x86S, a proposed simplified variant of the x86-64 architecture targeted at embedded systems, featuring a 64-bit mode-only design that eliminates legacy modes like and 16-bit support to reduce complexity. Key enhancements included direct 64-bit resets, streamlined segmentation, and support for 5-level paging without transitional legacy features. However, following ecosystem feedback and the formation of the x86 Ecosystem Advisory Group in 2024, terminated the x86S initiative in December 2024, opting instead for collaborative evolution of the standard x86-64 ISA.

Extensions and Microarchitectures

Performance Extensions

The x86-64 architecture mandates support for (SSE), which provide 128-bit vector operations for single-precision floating-point and data, enabling parallel processing of multiple elements within a single instruction. SSE2, introduced in 2001 with the processor, extends SSE by adding double-precision floating-point operations and full 64-bit support, making it a required baseline for all x86-64 implementations to ensure compatibility and performance in 64-bit mode. Advanced Vector Extensions (AVX), launched in 2011 with the Sandy Bridge microarchitecture, double the vector width to 256 bits using YMM registers, supporting broader SIMD operations for both floating-point and integer workloads while introducing three-operand syntax to reduce register pressure. AVX2, released in 2013 alongside the Haswell microarchitecture, further expands AVX by applying 256-bit operations to most integer instructions and incorporating Fused Multiply-Add (FMA) capabilities, which combine multiplication and addition in a single instruction to enhance precision and throughput in floating-point computations. AVX-512, introduced in 2016 with the Knights Landing in processors, extends vectors to 512 bits using ZMM registers, incorporating opmask registers for conditional execution (masking) to avoid unnecessary computations and conflict detection instructions like VPCONFLICT for identifying duplicate elements in vectors, which optimize algorithms such as sorting and hashing. A notable subset, , announced in 2021 and implemented in 2023 for Sapphire Rapids-based processors, supports half-precision (16-bit) floating-point operations natively, facilitating efficient handling of denormal numbers and accelerating workloads. These extensions significantly boost computational bandwidth in vectorized code; for instance, AVX-512 delivers up to 2x the operations per cycle compared to AVX2 in SIMD-heavy tasks like video encoding, by processing twice as many elements simultaneously while leveraging masking to maintain efficiency.

Security and Advanced Features

Intel Software Guard Extensions (SGX), introduced in 2015, provide hardware-based isolation for sensitive code and data through enclaves, which are protected regions of memory inaccessible to higher-privilege software like the operating system or hypervisor. These enclaves enable trusted execution environments where data in use is safeguarded via memory encryption and integrity checks, ensuring confidentiality even against privileged attacks. SGX achieves this isolation by partitioning application code into untrusted and trusted portions, with the trusted enclave running in a secure CPU mode that prevents external interference. AMD Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV), introduced in 2017, offer page-level memory encryption to protect against physical memory attacks in virtualized environments. SME uses a system-wide key generated by the AMD Secure Processor to transparently encrypt all system memory pages, mitigating threats like cold-boot attacks without requiring OS modifications beyond enabling in BIOS. SEV extends this by assigning unique encryption keys per virtual machine (VM), isolating guest memory from the hypervisor and other VMs to prevent unauthorized access or data leakage in cloud computing scenarios. Later variants like SEV-ES add encryption for CPU registers during VM halts, while SEV-SNP incorporates integrity protection to counter replay and remapping attacks. Intel Control-flow Enforcement Technology (CET), introduced in 2020 with the , defends against (ROP) and jump-oriented programming (JOP) attacks by enforcing valid control-flow transfers at the hardware level. CET employs shadow stacks, which are separate, CPU-managed stacks storing return addresses protected from modification by application , ensuring that indirect branches and returns match expected targets. Upon detecting a mismatch between the shadow stack and the application's data stack, CET triggers an exception to halt execution, thereby mitigating control-flow hijacking exploits commonly used in . This feature integrates with operating systems to enable shadow stack activation per thread or process, enhancing software without significant overhead for valid paths. Intel Memory Protection Extensions (MPX), introduced in 2013, aimed to simplify detection by providing hardware support for bounds checking on pointer arithmetic and memory accesses. MPX used dedicated registers to store bounds tables, allowing compilers to insert checks that compare pointer values against predefined limits, raising exceptions on violations to prevent exploits like stack smashing. However, due to performance impacts and limited adoption, MPX was deprecated starting in 2018 and is no longer supported in processors from the 12th generation onward. As a modern alternative to MPX, Linear Address Masking (LAM), introduced in 2023, embeds bounds metadata directly into unused upper bits of 64-bit linear addresses to enable efficient pointer bounds checking without additional hardware structures. LAM modifies canonical address formation by masking metadata bits (such as 62:48 for 48-bit mode or 62:57 for 57-bit mode) and sign-extending from a designated bit, allowing software to tag pointers with bounds information while maintaining compatibility with existing paging mechanisms. This approach reduces runtime overhead compared to MPX by avoiding separate bounds tables and integrates with 4- or 5-level paging, supporting enumeration via for enabled microarchitectures like those in processors.

Recent Extensions (2023–2025)

In July 2023, announced AVX10 as the successor to , introducing a versioned instruction set to simplify detection of supported vector instructions across implementations, with initial support in and later processors. This extension maintains 512-bit vector capabilities while streamlining compatibility for AI and workloads. 's Advanced Performance Extensions (APX), detailed in 2023 and planned for future microarchitectures starting around 2024–2025, expand the register file to 32 general-purpose registers (adding 16 to the existing 16) and introduce new instructions for improved code density and reduced spills in complex applications, marking the largest update to the x86 ISA since its 64-bit extension. In October 2025, and jointly detailed enhancements to the x86-64 ISA, including ChkTag, a memory-tagging instruction set for detecting common issues like buffer overflows and use-after-free errors through hardware-accelerated pointer validation, aimed at bolstering software in modern systems.

Microarchitecture Levels

The x86-64 has evolved through successive generations of processor s, prompting the definition of standardized levels in the x86-64 psABI supplement to guide software compilation targets and ensure portability across hardware generations. These levels—x86-64-v1, v2, v3, and v4—form a hierarchy of cumulative feature sets, allowing developers to optimize code for specific eras of processors while maintaining when necessary. By targeting a particular level via compiler flags like -march=x86-64-vN in GCC, software can leverage hardware-specific instructions without requiring runtime detection for basic execution, though dynamic dispatching enhances performance on varied systems. The foundational x86-64-v1 level encompasses the core 64-bit extensions introduced with the architecture in 2003, including eight 64-bit general-purpose registers (extendable to 16), MMX for integer SIMD, SSE and for single- and double-precision floating-point SIMD up to 128 bits, and FXSR for efficient state management. This baseline is universally supported by x86-64 processors, starting with AMD's K8 family (, ) and Intel's NetBurst-based Nocona . It enables 64-bit addressing and operations essential for large memory workloads but lacks later optimizations for vector processing and . Building on v1, x86-64-v2 incorporates enhancements for improved scalar and SIMD efficiency, adding CMPXCHG16B for 128-bit compare-and-swap atomics, LAHF/SAHF for legacy flag handling in 64-bit mode, POPCNT for fast bit population counting, SSE3 for horizontal SIMD adds and loads, SSSE3 for permute and absolute value operations, and SSE4.1/SSE4.2 for string manipulation, CRC32 computation, and enhanced integer SIMD. Hardware compatibility begins with Intel's Nehalem microarchitecture (Core i7, 2008) and AMD's Bulldozer family (2011), covering the majority of systems deployed since the early 2010s and enabling better performance in data processing tasks like compression and hashing. The x86-64-v3 level advances vector and arithmetic capabilities atop v2, introducing AVX and AVX2 for 256-bit SIMD with integer and floating-point support, FMA for high-precision multiply-accumulate, BMI1/BMI2 for bit shifts and population without destination overwrite, F16C for IEEE half-precision conversions, LZCNT for leading zero counting, MOVBE for endian-swapped moves, and AMD-specific SSE4A. It is supported by Intel's Haswell microarchitecture (2013) and successors like Broadwell and Skylake, as well as AMD's Excavator (2015, marking their "baseline2" alignment with BMI2 and related features) and later Zen families. This level significantly boosts throughput in parallelizable workloads such as matrix operations and encryption, though it excludes very early 64-bit systems. x86-64-v4 extends v3 with scalable 512-bit vector processing via AVX512F (foundation with masking and gathers), AVX512BW (byte/word ), AVX512CD (conflict detection for ), AVX512DQ (double/quadword shifts), and AVX512VL (vector length independence). Adoption is more limited, primarily in 's Skylake-SP/ (2017) and (2019) server lines, with (2022) offering optional enablement in select high-end models. It excels in scenarios like AI training and simulations but highlights gaps in older hardware, such as the absence of in pre-Skylake Intel consumer processors or most pre- chips. AMD's equivalents mirror these through their processor families, with the 2003 baseline matching v1 and 2015 updates (e.g., Carrizo/) aligning with v3 via BMI2 and AVX2 support. Detection of supported levels relies on the CPUID instruction, which exposes feature bits via specific leaves: leaf 1 for base SSE support (e.g., bit 26 for SSE2), and leaf 7 subleaf 0 for advanced extensions (e.g., bit 5 for AVX2, bits 3/8 for BMI1/BMI2, bit 16 for AVX512F). On Linux, the lscpu utility from util-linux parses /proc/cpuinfo to list flags like avx2, bmi2, and avx512f, enabling runtime queries for code selection. This approach supports software portability by allowing multi-versioned binaries—e.g., glibc's hardware capability mechanism loads optimized variants based on detected features—preventing crashes on unsupported hardware while maximizing efficiency. Compiling exclusively for higher levels risks incompatibility; for instance, v4-targeted code fails on pre-2017 Intel hardware lacking AVX-512, underscoring the need for level-aware deployment strategies in heterogeneous environments.

Differences Across Implementations

AMD vs. Intel Specifics

While both AMD64 and Intel 64 share the foundational x86-64 originally defined by , they exhibit notable differences in baseline features and implementation details that affect compatibility and system design. These variances stem from AMD's pioneering role in developing the 64-bit extension, which later adopted and extended under the Intel 64 branding, leading to divergences in hardware capabilities and control mechanisms. One key distinction lies in physical address space support. AMD's initial AMD64 implementations, starting with the K8 microarchitecture in the 2003 Opteron processors, provided 40-bit physical addressing, enabling up to 1 terabyte of physical memory, with the architecture specification allowing for expansion to 48 bits. In contrast, Intel's early Intel 64 implementations, introduced in 2004 with the Prescott-based Pentium 4 processors, limited physical addressing to 36 bits, supporting a maximum of 64 gigabytes of physical memory. This difference reflected AMD's more forward-looking design for larger memory configurations in server environments, while Intel's initial rollout prioritized compatibility with existing 32-bit systems. Instruction set specifics further highlight these differences. AMD added support for the LAHF (Load AH from Flags) and SAHF (Store AH into Flags) instructions in 64-bit mode with revision D steppings of its K8 processors ( and ), released in March 2005, as indicated by the LahfSahf bit in function 8000_0001H. added this support later, first in its Core microarchitecture with the 2006 Merom processors, also via the same bit, to enhance flag handling in 64-bit code without requiring emulation. Additionally, AMD retained legacy support for its proprietary 3DNow! SIMD instructions in AMD64, which extend MMX for floating-point operations and were integrated into the 64-bit media instructions, allowing backward compatibility for older multimedia applications. 64 implementations do not include 3DNow!, relying instead on standard SSE extensions for similar functionality. Model-specific registers (MSRs) also diverge to accommodate vendor-specific hardware. AMD utilizes MSRs in the range C001_001F to C001_0010 for configuring the northbridge, including link parameters and settings, as these components were integrated differently in 's chipsets during the early AMD64 era. , on the other hand, employs standard MTRRs (Memory Type Range Registers) such as IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn to define caching attributes for specific physical ranges, a feature inherited from and extended to Intel 64 for fine-grained access . Power management approaches reflect proprietary optimizations. introduced with its processors in 2004, a that dynamically adjusts CPU clock speed, voltage, and core states based on to reduce power consumption and heat, controlled via MSRs like HWCR and P-state registers. Intel's counterpart, Enhanced Intel SpeedStep Technology, debuted in 2005 with the processor and was adapted for Intel 64 in subsequent Core processors, enabling OS-directed frequency and voltage scaling through P-states for similar efficiency gains. These mechanisms, while conceptually aligned, use distinct hardware interfaces and implementations tailored to each vendor's .

Compatibility and Extensions

x86-64 implementations maintain compatibility through standardized mechanisms that allow software to detect and utilize core features as well as optional extensions. The instruction serves as the primary method for feature enumeration, enabling operating systems and applications to query processor capabilities at runtime. For instance, support for —the 64-bit operating mode—is indicated by bit 29 (LM) in the register when executing with EAX set to 80000001h. This bit check ensures that software only attempts to enter on capable processors, preventing incompatible execution across x86-64 vendors like and . Optional extensions enhance performance for specific workloads but are not universally required for basic x86-64 compatibility. Intel introduced the Advanced Encryption Standard New Instructions (AES-NI) in 2010 with its Westmere microarchitecture, providing hardware acceleration for AES encryption and decryption operations. AMD followed with equivalent AES instructions announced in 2010 for its Bulldozer architecture, released in 2011, ensuring that cryptographic software can detect and leverage these features via CPUID bits (e.g., bit 25 in ECX for function 00000001h). These extensions are enumerated separately, allowing binaries to run on processors lacking them by falling back to software implementations. Binary compatibility between and x86-64 processors is further ensured by the System V AMD64 Architecture Processor Supplement, a standardized (ABI) that defines calling conventions, data types, and object file formats for systems. This ABI specifies register usage for parameter passing (e.g., RDI, RSI for the first two arguments) and stack alignment rules, enabling executables compiled for one vendor to run unmodified on the other without recompilation. It promotes interoperability in the broader ecosystem while accommodating vendor-specific extensions through runtime detection. Deprecations and mitigations for security vulnerabilities also impact compatibility, often requiring hardware or firmware updates. In 2019, disabled (TSX) by default via updates on affected processors to address the Microarchitectural Data Sampling (MDS) (CVE-2018-12130 et al.), which exposed sensitive data through side channels. This disablement, controlled by MSR IA32_RTIT_CTL bit 11 or enumeration, prevents exploitation but may degrade performance in TSX-reliant applications, with software advised to check bit 18 in EBX (function 00000007h, subleaf 0) for availability. Such measures highlight the ongoing evolution of x86-64 to balance security and .

Adoption and Ecosystem

Operating System Support

The Linux kernel introduced full 64-bit support for the x86-64 architecture in version 2.6.0, released on December 18, 2003, marking the first stable integration of the x86_64 port developed from the i386 codebase. This support included native 64-bit execution, expanded register usage, and compatibility modes for 32-bit applications, enabling the kernel to address vastly larger memory spaces without the limitations of 32-bit addressing. By early 2004, distributions based on Linux 2.6 began widely adopting x86-64, providing features like larger virtual address spaces and improved performance for compute-intensive workloads. Kernel limitations at the time included experimental support for certain hardware features, but the architecture quickly became the default for 64-bit Linux systems. Microsoft released the first 64-bit edition of in April 2005, supporting the x86-64 architecture on Opteron and Intel Xeon processors. While the 64-bit kernel natively handled large configurations, the initial boot process on systems with more than 4 GB of RAM required Physical Address Extension (PAE) in the boot loader for compatibility with legacy firmware. User-mode processes in this edition were limited to 128 TB of addressable , a significant expansion over 32-bit constraints, though kernel-mode access could reach similar scales depending on hardware. This release laid the foundation for enterprise adoption of x86-64 in Windows, with subsequent service packs enhancing stability and driver support. Apple transitioned macOS to x86-64 with Mac OS X 10.4 , initially released for PowerPC in April 2005 but updated for processors starting with version 10.4.4 in August 2005, following the company's announcement of the Intel shift in June 2005. This marked the end of PowerPC support in consumer macOS releases, with providing hybrid 32/64-bit capabilities, including a 64-bit kernel and support for 64-bit applications on compatible hardware. On Intel-based systems, supported up to 192 GB of RAM depending on the model, such as the , enabling advanced multimedia and development workloads while maintaining backward compatibility through for PowerPC binaries. Limitations included partial 64-bit optimization in some system components until later updates. BSD variants were among the early adopters of x86-64. FreeBSD 5.2-RELEASE, issued in January 2004, included the amd64 architecture port with full 64-bit kernel support, allowing access to and registers beyond 32-bit limits. For systems exceeding 4 GB of RAM on the 32-bit variant, PAE was required to enable larger physical memory addressing, though the amd64 port handled this natively without such extensions. Similarly, OpenBSD 3.7, released in May 2005, provided official amd64 support, emphasizing security features like adapted for 64-bit execution. Both variants focused on stability and portability, with offering robust networking and prioritizing audited codebases for x86-64 deployments.

Hardware Platforms and Consoles

The x86-64 architecture has dominated server hardware since its introduction with AMD's processors in 2003, followed closely by Intel's lineup, establishing a duopoly that persists into 2025. AMD's processors, launched in 2017, have significantly eroded Intel's lead through superior core counts and efficiency in multi-threaded workloads, leading to AMD capturing approximately 28% of the server CPU market as of Q3 2025. This balance reflects AMD's focus on high-density computing for data centers, where EPYC's design enables scalable performance, while Intel's maintains advantages in single-threaded tasks and legacy compatibility. In client personal computers, x86-64 became the universal standard by 2008, as virtually all new desktop and processors from and transitioned from 32-bit x86 to 64-bit variants, enabling access to larger memory addressing and improved application performance. This widespread adoption was driven by software ecosystem maturity, with operating systems like Windows fully supporting 64-bit modes, making x86-64 the default for consumer PCs. However, the have seen rising competition from ARM-based architectures, particularly in laptops, where Apple's M-series chips and Qualcomm's Snapdragon X Elite have captured over 13% market share by 2025, appealing to users prioritizing battery life and AI acceleration over raw x86 compatibility. Despite this, x86-64 remains dominant in desktops and high-performance clients due to its entrenched software base and . Gaming consoles marked a significant expansion of x86-64 into consumer entertainment with the eighth-generation systems in 2013. The PlayStation 4 and Xbox One both employed AMD's Jaguar microarchitecture, featuring eight x86-64 cores clocked at 1.6–1.75 GHz, optimized for cost-effective multitasking in gaming and media applications. This shift from proprietary architectures like the PowerPC in prior consoles facilitated easier porting of PC games and unified developer tools. The ninth-generation consoles, launched in 2020, advanced to AMD's Zen 2 architecture: the PlayStation 5 uses an eight-core Zen 2 CPU at up to 3.5 GHz, while the Xbox Series X employs a similar custom eight-core design boosted to 3.8 GHz with simultaneous multithreading, delivering substantial gains in CPU-bound scenarios like open-world simulations. For embedded and IoT applications, x86-64's low-power implementations include Intel's Atom processors, which since the generation in 2013 have provided 64-bit support in compact, energy-efficient SoCs for devices like gateways and sensors. Intel's series, introduced in 2013, targeted ultra-low-power IoT nodes with x86 cores, though primarily 32-bit; later Atom variants extended 64-bit capabilities for scalable . These platforms enable x86-64 compatibility in resource-constrained environments, bridging IoT data to servers without architectural translation overhead.

Industry Naming and Licensing

The x86-64 architecture is known by several industry terms, reflecting its origins and vendor-specific branding. , which developed the initial 64-bit extension to the x86 instruction set, officially brands it as AMD64. , adopting the architecture under license, refers to its implementation as Intel 64. In neutral and technical contexts, the term x86-64 is widely used to denote the architecture generically, while x64 serves as a common shorthand in and operating system documentation. Licensing for x86-64 stems from 's original patent portfolio, which it made available royalty-free to and other parties through cross-licensing agreements starting around the architecture's 2003 debut with the processor. The 2009 patent cross-license agreement between and explicitly grants each company non-exclusive, fully paid-up (royalty-free) worldwide rights to the other's patents, including those covering x86-64 families, enabling broad implementation without ongoing fees. This model has facilitated compatibility across vendors, with additional licensees like also accessing the technology under similar terms. Trademarks associated with x86-64 vary by vendor to protect branding. holds trademarks related to "64" in the context of its processor technologies, while uses "AMD 64-bit Technology" for its implementations. In contrast, x64 has become a generic, non-trademarked term in software ecosystems, originating from Microsoft's early 64-bit Edition branding and now used freely in programming tools and binaries without proprietary restrictions. Over time, the nomenclature has evolved toward the vendor-neutral "x86-64" in standards bodies and open-source projects, such as the , to prevent and emphasize the architecture's shared ecosystem. This shift promotes and reduces reliance on company-specific labels in documentation and development.

References

  1. https://en.wikichip.org/wiki/amd/athlon_64
  2. https://en.wikichip.org/wiki/x86/sme
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.