Hubbry Logo
X86 assembly languageX86 assembly languageMain
Open search
X86 assembly language
Community hub
X86 assembly language
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
X86 assembly language
X86 assembly language
from Wikipedia

x86 assembly language is a family of low-level programming languages that are used to produce object code for the x86 class of processors. These languages provide backward compatibility with CPUs dating back to the Intel 8008 microprocessor, introduced in April 1972.[1][2] As assembly languages, they are closely tied to the architecture's machine code instructions, allowing for precise control over hardware.

In x86 assembly languages, mnemonics are used to represent fundamental CPU instructions, making the code more human-readable compared to raw machine code. Each machine code instruction is an opcode which, in assembly, is replaced with a mnemonic.[3] Each mnemonic corresponds to a basic operation performed by the processor, such as arithmetic calculations, data movement, or control flow decisions. Assembly languages are most commonly used in applications where performance and efficiency are critical. This includes real-time embedded systems, operating-system kernels, and device drivers, all of which may require direct manipulation of hardware resources.

Additionally, compilers for high-level programming languages sometimes generate assembly code as an intermediate step during the compilation process. This allows for optimization at the assembly level before producing the final machine code that the processor executes.

Mnemonics and opcodes

[edit]

Each instruction in the x86 assembly language is represented by a mnemonic which often combines with one or more operands to translate into one or more bytes known as an opcode. For example, the NOP instruction translates to the opcode 0x90, and the HLT instruction translates to 0xF4.[3] There are potential opcodes without documented mnemonics, which different processors may interpret differently. Using such opcodes can cause a program to behave inconsistently or even generate exceptions on some processors.

Syntax

[edit]

x86 assembly language has two primary syntax branches: Intel syntax and AT&T syntax.[4] Intel syntax is dominant in the DOS and Windows environments, while AT&T syntax is dominant in Unix-like systems, as Unix was originally developed at AT&T Bell Labs.[5] Below is a summary of the main differences between Intel syntax and AT&T syntax:

AT&T Intel
Parameter order
movl $5, %eax
Source before the destination.
mov eax, 5
Destination before source.
Parameter size
addl $0x24, %esp
movslq %ecx, %rax
paddd %xmm1, %xmm2
Mnemonics are suffixed with a letter indicating the size of the operands: q for qword (64 bits), l for long (dword, 32 bits), w for word (16 bits), and b for byte (8 bits).[4]
add esp, 24h
movsxd rax, ecx
paddd xmm2, xmm1
Derived from the name of the register that is used (e.g. rax, eax, ax, al imply q, l, w, b, respectively).

Width-based names may still appear in instructions when they define a different operation.

  • MOVSXD refers to sign extension with dword input, unlike MOVSX.
  • SIMD registers have width-named instructions that determine how to split up the register. AT&T tends to keep the names unchanged, so PADDD is not renamed to "paddl".
Sigils Immediate values prefixed with a "$", registers prefixed with a "%".[4] The assembler automatically detects the type of symbols; i.e., whether they are registers, constants or something else.
Effective addresses
movl offset(%ebx, %ecx, 4), %eax
General syntax of displacement(base, index, scale).
mov eax, [ebx + ecx*4 + offset]
Arithmetic expressions in square brackets; additionally, size keywords like byte, word, or dword have to be used if the size cannot be determined from the operands.[4]

Many x86 assemblers use Intel syntax, including FASM, MASM, NASM, TASM, and YASM. The GNU Assembler, which originally used AT&T syntax, has supported both syntaxes since version 2.10 via the .intel_syntax directive.[4][6][7] A quirk in the AT&T syntax for x86 is that x87 floating-point operands are reversed, an inherited bug from the original AT&T assembler.[8]

The AT&T syntax is nearly universal across other architectures (retaining the same operand order for the mov instruction); it was originally designed for PDP-11 assembly and was inherited onto Unix-like systems. In contrast, the Intel syntax is specific to the x86 architecture and is the one used in the x86 platform's official documentation. The Intel 8080, which predates the x86 architecture, also uses the "destination-first" order for mov instruction.[9]

Reserved words

[edit]

In most x86 assembly languages, the reserved words consist of two parts: mnemonics that translate to opcodes, and directives (or "pseudo-ops") that access features in the assembler program beyond the simple translation of opcodes. For a list of the former part, see x86 instruction listings. The latter part is highly assembler-dependent, with no such thing as a standard among Intel-syntax assemblers.[10] AT&T-syntax assemblers share a common way of naming directives (all directives starts with a dot, like .ascii),[11] and a number of basic directives such as .ascii and .string are broadly supported.[12][13]

Registers

[edit]

x86 processors feature a set of registers that serve as storage for binary data and addresses during program execution. These registers are categorized into general-purpose registers, segment registers, the instruction pointer, the FLAGS register, and various extension registers introduced in later processor models. Each register has specific functions in addition to their general capabilities:[3]

General-purpose registers

[edit]

These registers have conventional roles, but usage is not strictly enforced. Programs are generally free to use them for other purposes.

  • AX (Accumulator register): Primarily used in arithmetic, logic, and data transfer operations. It is favored by instructions that perform multiplication and division, and by string load and store operations. Immediate ALU operations and exchanges with AX can be encoded more compactly.
  • BX (Base register): Base pointer for memory access. It can hold the base address of data structures and is useful in indexed addressing modes. It is used with XLAT.
  • CX (Count register): Serves as a counter in loop, string, and shift/rotate instructions. Iterative operations often use CX to determine the number of times a loop or operation should execute.
  • DX (Data register): Used in conjunction with AX for multiplication and division operations that produce results larger than 16 bits. It also holds I/O port addresses for IN and OUT instructions.
  • SP (Stack pointer): Points to the top of stack in memory. It is automatically updated during PUSH and POP operations.
  • BP (Base Pointer): Points to the top of the call stack. It is primarily used to access function parameters and local variables within the call stack.
  • SI (Source Index): Used as a pointer to the source in string and memory array operations. Instructions like MOVS (move string) use SI to read data from memory. Like BX, it can be used for indexing. It can be added to BP or BX for double indexing.
  • DI (Destination Index): Serves as a pointer to the destination in string and memory array operations. It works alongside SI in instructions that copy or compare data, writing results to memory. Like BX, it can be used for indexing. It can be added to BP or BX for double indexing.

Along with the general registers there are additionally the:

  • Instruction Pointer (IP): Holds the offset address of the next instruction to be executed within the code segment (CS). It points to the first byte of the next instruction. While the IP register cannot be read directly by programmers, its value changes through control flow instructions such as jumps, calls, and interrupts, which alter the flow of execution.
  • FLAGS register: Contains a set of status, control, and system flags that reflect the outcome of operations and control the processor's operations.
  • Segment registers (CS, DS, ES, SS): Determines where a 64k segment starts (FS and GS in were added to 80386 and later)
  • Extra extension registers (MMX, 3DNow!, SSE, etc.) (Pentium & later only).

The x86 registers can be used by most instructions. For example, in Intel syntax:

mov ax, 1234h ; copies the value 1234hex (4660d) into register AX
mov bx, ax    ; copies the value of the AX register into the BX register

Segmented addressing

[edit]

The x86 architecture in real and virtual 8086 mode uses a process known as segmentation to address memory, not the flat memory model used in many other environments. Segmentation involves composing a memory address from two parts, a segment and an offset; the segment points to the beginning of a 64 KiB (64×210) group of addresses and the offset determines how far from this beginning address the desired address is. In segmented addressing, two registers are required for a complete memory address. One to hold the segment, the other to hold the offset. In order to translate back into a flat address, the segment value is shifted four bits left (equivalent to multiplication by 24 or 16) then added to the offset to form the full address, which allows breaking the 64k barrier through clever choice of addresses, though it makes programming considerably more complex.

In real mode/protected only, for example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE == 0xEB5CE. Therefore, the CPU can address up to 1,048,576 bytes (1 MiB) in real mode. By combining segment and offset values we find a 20-bit address.

The original IBM PC restricted programs to 640 KB but an expanded memory specification was used to implement a bank switching scheme that fell out of use when later operating systems, such as Windows, used the larger address ranges of newer processors and implemented their own virtual memory schemes.

Protected mode, starting with the Intel 80286, was utilized by OS/2. Several shortcomings, such as the inability to access the BIOS and the inability to switch back to real mode without resetting the processor, prevented widespread usage.[14] The 80286 was also still limited to addressing memory in 16-bit segments, meaning only 216 bytes (64 kilobytes) could be accessed at a time. To access the extended functionality of the 80286, the operating system would set the processor into protected mode, enabling 24-bit addressing and thus 224 bytes of memory (16 megabytes).

In protected mode, the segment selector can be broken down into three parts: a 13-bit index, a Table Indicator bit that determines whether the entry is in the GDT or LDT and a 2-bit Requested Privilege Level; see x86 memory segmentation.

When referring to an address with a segment and an offset the notation of segment:offset is used, so in the above example the flat address 0xEB5CE can be written as 0xDEAD:0xCAFE or as a segment and offset register pair; DS:DX.

There are some special combinations of segment registers and general registers that point to important addresses:

  • CS:IP (CS is Code Segment, IP is Instruction Pointer) points to the address where the processor will fetch the next byte of code.
  • SS:SP (SS is Stack Segment, SP is Stack Pointer) points to the address of the top of the stack, i.e. the most recently pushed byte.
  • SS:BP (SS is Stack Segment, BP is Stack Frame Pointer) points to the address of the top of the stack frame, i.e. the base of the data area in the call stack for the currently active subprogram.
  • DS:SI (DS is Data Segment, SI is Source Index) is often used to point to string data that is about to be copied to ES:DI.
  • ES:DI (ES is Extra Segment, DI is Destination Index) is typically used to point to the destination for a string copy, as mentioned above.

The Intel 80386 featured three operating modes: real mode, protected mode and virtual mode. The protected mode which debuted in the 80286 was extended to allow the 80386 to address up to 4 GB of memory, the all new virtual 8086 mode (VM86) made it possible to run one or more real mode programs in a protected environment which largely emulated real mode, though some programs were not compatible (typically as a result of memory addressing tricks or using unspecified op-codes).

The 32-bit flat memory model of the 80386's extended protected mode may be the most important feature change for the x86 processor family until AMD released x86-64 in 2003, as it helped drive large scale adoption of Windows 3.1 (which relied on protected mode) since Windows could now run many applications at once, including DOS applications, by using virtual memory and simple multitasking.

Execution modes

[edit]

The x86 processors support five modes of operation for x86 code, Real Mode, Protected Mode, Long Mode, Virtual 86 Mode, and System Management Mode, in which some instructions are available and others are not. A 16-bit subset of instructions is available on the 16-bit x86 processors, which are the 8086, 8088, 80186, 80188, and 80286. These instructions are available in real mode on all x86 processors, and in 16-bit protected mode (80286 onwards), additional instructions relating to protected mode are available. On the 80386 and later, 32-bit instructions (including later extensions) are also available in all modes, including real mode; on these CPUs, V86 mode and 32-bit protected mode are added, with additional instructions provided in these modes to manage their features. SMM, with some of its own special instructions, is available on some Intel i386SL, i486 and later CPUs. Finally, in long mode (AMD Opteron onwards), 64-bit instructions, and more registers, are also available. The instruction set is similar in each mode but memory addressing and word size vary, requiring different programming strategies.

The modes in which x86 code can be executed in are:

  • Real mode (16-bit)
    • 20-bit segmented memory address space (meaning that only 1 MB of memory can be addressed— actually since 80286 a little more through HMA), direct software access to peripheral hardware, and no concept of memory protection or multitasking at the hardware level. Computers that use BIOS start up in this mode.
  • Protected mode (16-bit and 32-bit)
    • Expands addressable physical memory to 16 MB and addressable virtual memory to 1 GB. Provides privilege levels and protected memory, which prevents programs from corrupting one another. 16-bit protected mode (used during the end of the DOS era) used a complex, multi-segmented memory model. 32-bit protected mode uses a simple, flat memory model.
  • Long mode (64-bit)
    • Mostly an extension of the 32-bit (protected mode) instruction set, but unlike the 16–to–32-bit transition, many instructions were dropped in the 64-bit mode. Pioneered by AMD.
  • Virtual 8086 mode (16-bit)
    • A special hybrid operating mode that allows real mode programs and operating systems to run while under the control of a protected mode supervisor operating system
  • System Management Mode (16-bit)
    • Handles system-wide functions like power management, system hardware control, and proprietary OEM designed code. It is intended for use only by system firmware. All normal execution, including the operating system, is suspended. An alternate software system (which usually resides in the computer's firmware, or a hardware-assisted debugger) is then executed with high privileges.

Switching modes

[edit]

The processor runs in real mode immediately after power on, so an operating system kernel, or other program, must explicitly switch to another mode if it wishes to run in anything but real mode. Switching modes is accomplished by modifying certain bits of the processor's control registers after some preparation, and some additional setup may be required after the switch.

Examples

[edit]

With a computer running legacy BIOS, the BIOS and the boot loader run in Real mode. The 64-bit operating system kernel checks and switches the CPU into Long mode and then starts new kernel-mode threads running 64-bit code.

With a computer running UEFI, the UEFI firmware (except CSM and legacy Option ROM), the UEFI boot loader and the UEFI operating system kernel all run in Long mode.

Instruction types

[edit]

In general, the features of the modern x86 instruction set are:

  • A compact encoding
    • Variable length and alignment independent (encoded as little endian, as is all data in the x86 architecture)
    • Mainly one-address and two-address instructions, that is to say, the first operand is also the destination.
    • Memory operands as both source and destination are supported (frequently used to read/write stack elements addressed using small immediate offsets).
    • Both general and implicit register usage; although all seven (counting ebp) general registers in 32-bit mode, and all fifteen (counting rbp) general registers in 64-bit mode, can be freely used as accumulators or for addressing, most of them are also implicitly used by certain (more or less) special instructions; affected registers must therefore be temporarily preserved (normally stacked), if active during such instruction sequences.
  • Produces conditional flags implicitly through most integer ALU instructions.
  • Supports various addressing modes including immediate, offset, and scaled index but not PC-relative, except jumps (introduced as an improvement in the x86-64 architecture).
  • Includes floating point to a stack of registers.
  • Contains special support for atomic read-modify-write instructions (xchg, cmpxchg/cmpxchg8b, xadd, and integer instructions which combine with the lock prefix)
  • SIMD instructions (instructions which perform parallel simultaneous single instructions on many operands encoded in adjacent cells of wider registers).

Stack instructions

[edit]

The x86 architecture has hardware support for an execution stack mechanism. Instructions such as push, pop, call and ret are used with the properly set up stack to pass parameters, to allocate space for local data, and to save and restore call-return points. The ret size instruction is very useful for implementing space efficient (and fast) calling conventions where the callee is responsible for reclaiming stack space occupied by parameters.

When setting up a stack frame to hold local data of a recursive procedure there are several choices; the high level enter instruction (introduced with the 80186) takes a procedure-nesting-depth argument as well as a local size argument, and may be faster than more explicit manipulation of the registers (such as push bp ; mov bp, sp ; sub sp, size). Whether it is faster or slower depends on the particular x86-processor implementation as well as the calling convention used by the compiler, programmer or particular program code; most x86 code is intended to run on x86-processors from several manufacturers and on different technological generations of processors, which implies highly varying microarchitectures and microcode solutions as well as varying gate- and transistor-level design choices.

The full range of addressing modes (including immediate and base+offset) even for instructions such as push and pop, makes direct usage of the stack for integer, floating point and address data simple, as well as keeping the ABI specifications and mechanisms relatively simple compared to some RISC architectures (require more explicit call stack details).

Integer ALU instructions

[edit]

x86 assembly has the standard mathematical operations, add, sub, neg, imul and idiv (for signed integers), with mul and div (for unsigned integers); the logical operators and, or, xor, not; bitshift arithmetic and logical, sal/sar (for signed integers), shl/shr (for unsigned integers); rotate with and without carry, rcl/rcr, rol/ror, a complement of BCD arithmetic instructions, aaa, aad, daa and others.

Floating-point instructions

[edit]

x86 assembly language includes instructions for a stack-based floating-point unit (FPU). The FPU was an optional separate coprocessor for the 8086 through the 80386, it was an on-chip option for the 80486 series, and it is a standard feature in every Intel x86 CPU since the 80486, starting with the Pentium. The FPU instructions include addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions, which can load or store a value from memory in any of the following formats: binary-coded decimal, 32-bit integer, 64-bit integer, 32-bit floating-point, 64-bit floating-point or 80-bit floating-point (upon loading, the value is converted to the currently used floating-point mode). x86 also includes a number of transcendental functions, including sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e.

The stack register to stack register format of the instructions is usually fop st, st(n) or fop st(n), st, where st is equivalent to st(0), and st(n) is one of the 8 stack registers (st(0), st(1), ..., st(7)). Like the integers, the first operand is both the first source operand and the destination operand. fsubr and fdivr should be singled out as first swapping the source operands before performing the subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions include instruction modes that pop the top of the stack after their operation is complete. So, for example, faddp st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from the top of stack, thus making what was the result in st(1) the top of the stack in st(0).

SIMD instructions

[edit]

Modern x86 CPUs contain SIMD instructions, which largely perform the same operation in parallel on many values encoded in a wide SIMD register. Various instruction technologies support different operations on different register sets, but taken as complete whole (from MMX to SSE4.2) they include general computations on integer or floating-point arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or square root). So for example, paddw mm0, mm1 performs 4 parallel 16-bit (indicated by the w) integer adds (indicated by the padd) of mm0 values to mm1 and stores the result in mm0. Streaming SIMD Extensions or SSE also includes a floating-point mode in which only the very first value of the registers is actually modified (expanded in SSE2). Some other unusual instructions have been added including a sum of absolute differences (used for motion estimation in video compression, such as is done in MPEG) and a 16-bit multiply accumulation instruction (useful for software-based alpha-blending and digital filtering). SSE (since SSE3) and 3DNow! extensions include addition and subtraction instructions for treating paired floating-point values like complex numbers.

These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and extracting the values around within the registers. In addition there are instructions for moving data between the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.

Memory instructions

[edit]

The x86 processor also includes complex addressing modes for addressing memory with an immediate offset, a register, a register with an offset, a scaled register with or without an offset, and a register with an optional offset and another scaled register. So for example, one can encode mov eax, [Table + ebx + esi*4] as a single instruction which loads 32 bits of data from the address computed as (Table + ebx + esi * 4) offset from the ds selector, and stores it to the eax register. In general x86 processors can load and use memory matched to the size of any register it is operating on. (The SIMD instructions also include half-load instructions.)

Most 2-operand x86 instructions, including integer ALU instructions, use a standard "addressing mode byte"[15] often called the MOD-REG-R/M byte.[16][17][18] Many 32-bit x86 instructions also have a SIB addressing mode byte that follows the MOD-REG-R/M byte.[19][20][21][22][23]

In principle, because the instruction opcode is separate from the addressing mode byte, those instructions are orthogonal because any of those opcodes can be mixed-and-matched with any addressing mode. However, the x86 instruction set is generally considered non-orthogonal because most dyadic operations cannot operate memory to memory, other opcodes have some fixed addressing mode (they have no addressing mode byte), and every register has a preferred use.[23][24]

The x86 instruction set includes string load, store, move, scan and compare instructions (lods, stos, movs, scas and cmps) which perform each operation to a specified size (b for 8-bit byte, w for 16-bit word, d for 32-bit double word) then increments/decrements (depending on DF, direction flag) the implicit address register (si for lods, di for stos and scas, and both for movs and cmps). For the load, store and scan operations, the implicit target/source/comparison register is in the al, ax or eax register (depending on size). The implicit segment registers used are ds for si and es for di. The cx or ecx register is used as a decrementing counter, and the operation stops when the counter reaches zero or, for scans and comparisons, when equality or inequality is detected. Unfortunately, over the years the performance of some of these instructions became neglected and in certain cases it is possible to get faster results by coding using more elemental instructions. Intel and AMD have refreshed some of the instructions though, and as of 2025 some have very respectable performance.

The stack is a region of memory and an associated stack pointer, which points to the last item pushed on the stack. The stack pointer is decremented before items are added, push, and incremented after things are removed, pop. In 16-bit mode, this implicit stack pointer is addressed as SS:[SP], in 32-bit mode it is SS:[ESP], and in 64-bit mode it is [RSP]. The stack pointer points to the last value that was stored, under the assumption that its size will match the operating mode of the processor (i.e., 16, 32, or 64 bits) to match the default width of the push/pop/call/ret instructions. Also included are the instructions enter and leave which reserve and remove data from the top of the stack while setting up a stack frame pointer in bp/ebp/rbp. However, direct setting, or addition and subtraction to the sp/esp/rsp register is also supported, so the enter/leave instructions are generally unnecessary.

This code is the beginning of a function typical for a high-level language when compiler optimisation is turned off for ease of debugging:

 push    rbp       ; Save the calling function’s stack frame pointer (rbp register)
 mov     rbp, rsp  ; Make a new stack frame below our caller’s stack
 sub     rsp, 32   ; Reserve 32 bytes of stack space for this function’s local variables.
                   ; Local variables will be below rbp and can be referenced relative to rbp,
                   ; again best for ease of debugging, but for best performance rbp will not
                   ; be used at all, and local variables would be referenced relative to rsp
                   ; because, apart from the code saving, rbp then is free for other uses.
                 ; However, if rbp is altered here, its value should be preserved for the caller.
 mov [rbp-8], rdx  ; Example of writing to a local variable (by its memory location) from register rdx

...is functionally equivalent to just:

 enter   32, 0

Other instructions for manipulating the stack include pushfd(32-bit) / pushfq(64-bit) and popfd/popfq for storing and retrieving the EFLAGS (32-bit) / RFLAGS (64-bit) register.

Values for a SIMD load or store are assumed to be packed in adjacent positions for the SIMD register and will align them in sequential little-endian order. Some SSE load and store instructions require 16-byte alignment to function properly. The SIMD instruction sets also include "prefetch" instructions which perform the load but do not target any register, used for cache loading. The SSE instruction sets also include non-temporal store instructions which will perform stores straight to memory without performing a cache allocate if the destination is not already cached (otherwise it will behave like a regular store.)

Most generic integer and floating-point (but no SIMD) instructions can use one parameter as a complex address as the second source parameter. Integer instructions can also accept one memory parameter as a destination operand.

Program flow

[edit]

The x86 assembly has an unconditional jump operation, jmp, which can take an immediate address, a register or an indirect address as a parameter (note that most RISC processors only support a link register or short immediate displacement for jumping).

Also supported are several conditional jumps, including jz (jump on zero), jnz (jump on non-zero), jg (jump on greater than, signed), jl (jump on less than, signed), ja (jump on above/greater than, unsigned), jb (jump on below/less than, unsigned). These conditional operations are based on the state of specific bits in the (E)FLAGS register. Many arithmetic and logic operations set, clear or complement these flags depending on their result. The comparison cmp (compare) and test instructions set the flags as if they had performed a subtraction or a bitwise AND operation, respectively, without altering the values of the operands. There are also instructions such as clc (clear carry flag) and cmc (complement carry flag) which work on the flags directly. Floating point comparisons are performed via fcom or ficom instructions which eventually have to be converted to integer flags.

Each jump operation has three different forms, depending on the size of the operand. A short jump uses an 8-bit signed operand, which is a relative offset from the current instruction. A near jump is similar to a short jump but uses a 16-bit signed operand (in real or protected mode) or a 32-bit signed operand (in 32-bit protected mode only). A far jump is one that uses the full segment base:offset value as an absolute address. There are also indirect and indexed forms of each of these.

In addition to the simple jump operations, there are the call (call a subroutine) and ret (return from subroutine) instructions. Before transferring control to the subroutine, call pushes the segment offset address of the instruction following the call onto the stack; ret pops this value off the stack, and jumps to it, effectively returning the flow of control to that part of the program. In the case of a far call, the segment base is pushed following the offset; far ret pops the offset and then the segment base to return.

There are also two similar instructions, int (interrupt), which saves the current (E)FLAGS register value on the stack, then performs a far call, except that instead of an address, it uses an interrupt vector, an index into a table of interrupt handler addresses. Typically, the interrupt handler saves all other CPU registers it uses, unless they are used to return the result of an operation to the calling program (in software called interrupts). The matching return from interrupt instruction is iret, which restores the flags after returning. Soft Interrupts of the type described above are used by some operating systems for system calls, and can also be used in debugging hard interrupt handlers. Hard interrupts are triggered by external hardware events, and must preserve all register values as the state of the currently executing program is unknown. In Protected Mode, interrupts may be set up by the OS to trigger a task switch, which will automatically save all registers of the active task.

Examples

[edit]

The following examples use the so-called Intel-syntax flavor as used by the assemblers Microsoft MASM, NASM and many others. (Note: There is also an alternative AT&T-syntax flavor where the order of source and destination operands are swapped, among many other differences.)[25]

"Hello world!" program for MS-DOS in MASM-style assembly

[edit]

Using the software interrupt 21h instruction to call the MS-DOS operating system for output to the display – other samples use libc's C printf() routine to write to stdout. Note that the first example is an example using 16-bit mode as on an Intel 8086. The second example is Intel 386 code in 32-bit mode. Modern code will be in 64-bit mode.[26]

.model small
.stack 100h

.data
msg	db	'Hello world!$'

.code
start:
	mov	ah, 09h    ; Sets 8-bit register ‘ah’, the high byte of register ax, to 9, to
                   ; select a sub-function number of an MS-DOS routine called below
                   ; via the software interrupt int 21h to display a message
	lea	dx, msg    ; Takes the address of msg, stores the address in 16-bit register dx
	int	21h        ; Various MS-DOS routines are callable by the software interrupt 21h
                   ; Our required sub-function was set in register ah above

	mov	ax, 4C00h  ; Sets register ax to the sub-function number for MS-DOS’s software
                   ; interrupt int 21h for the service ‘terminate program’.
	int	21h        ; Calling this MS-DOS service never returns, as it ends the program.

end start

"Hello world!" program for Windows in MASM and NASM style assembly

[edit]
! MASM NASM Description
; requires /coff switch on 6.15 and earlier versions
.386
.model small,c
.stack 1000h
; Image base = 0x00400000
%define RVA(x) (x-0x00400000)
Preamble. MASM requires defining the address model and stack size.
.data
msg     db "Hello world!",0
section .data
msg db "Hello world!"
Data section. We use the db (define byte) pseudo-op to define a string.
.code
includelib libcmt.lib
includelib libvcruntime.lib
includelib libucrt.lib
includelib legacy_stdio_definitions.lib

extrn printf:near
extrn exit:near

public main
main proc
        push    offset msg
        call    printf
        push    0
        call    exit
main endp
end
section .text
push dword msg
call dword [printf]
push byte +0
call dword [exit]
ret

section .idata
dd RVA(msvcrt_LookupTable)
dd -1
dd 0
dd RVA(msvcrt_string)
dd RVA(msvcrt_imports)
times 5 dd 0 ; ends the descriptor table

msvcrt_string dd "msvcrt.dll", 0
msvcrt_LookupTable:
dd RVA(msvcrt_printf)
dd RVA(msvcrt_exit)
dd 0

msvcrt_imports:
printf dd RVA(msvcrt_printf)
exit dd RVA(msvcrt_exit)
dd 0

msvcrt_printf:
dw 1
dw "printf", 0

msvcrt_exit:
dw 2
dw "exit", 0
dd 0
The code (.text section) and the import table. In NASM the import table is manually constructed, while in the MASM example directives are used to simplify the process.

"Hello world!" program for Linux in AT&T and NASM assembly

[edit]
AT&T (GNU as) Intel (NASM) Description
.data
section .data
Like in the Windows example, .data is the section for initialized data.
str: .ascii "Hello, world!\n"
str:     db 'Hello world!', 0Ah
Define a string of text containing "Hello, world!" and then a new line (\n, which is 0x0A). Bind the label "str" to the address of the defined string.
str_len = . - str
str_len: equ $ - str
Calculate the length of str. . means "here" in gas and $ means the same in nasm. By subtracting "str" from "here", one gets the length of the previously defined string.
.text
section .text
Like in the Windows example, .text is the section for program code.
.globl _start
global _start
export the _start function to the global scope for it to be "seen" by the linker
_start:
_start:
Define a label called _start, to which we will write our subroutine. The name _start, by Linux convention, defines the entry point.
    movl $4, %eax
    movl $1, %ebx
    movl $str, %ecx
    movl $str_len, %edx
	mov	eax, 4
	mov	ebx, 1
	mov	ecx, str
	mov	edx, str_len
Prepare a system call. EAX=4 requests the "sys_write" call on Linux x86. EBX=1 means "stdout" for sys_write. ECX holds the string to write, and EDX holds the number of bytes to write. The is equivalent to the libc-wrapped version write(1, str, str_len).
    int $0x80
    int 80h
On x86, the system interrupt "80h" is used for invoking a system call according to the values of eax, ebx, ecx, and edx.
    movl $1, %eax
    movl $0, %ebx
    int $0x80
	mov	eax, 1
	mov	ebx, 0
	int	80h
Load another system call, then call it with INT 80h: EAX=1 is sys_exit, and EBX for sys_exit holds the return value. A return value of 0 means a normal exit. In C syntax, _exit(0);.

Note for NASM:

; This program runs in 32-bit protected mode.
;  build: nasm -f elf -F stabs name.asm
;  link:  ld -o name name.o
;
; In 64-bit long mode you can use 64-bit registers (e.g. rax instead of eax, rbx instead of ebx, etc.)
; Also change "-f elf " for "-f elf64" in build command.
; For 64-bit long mode, "lea rcx, str" would be the address of the message, note 64-bit register rcx.

"Hello world!" program for Linux in NASM style assembly using the C standard library

[edit]
;
;  This program runs in 32-bit protected mode.
;  gcc links the standard-C library by default

;  build: nasm -f elf -F stabs name.asm
;  link:  gcc -o name name.o
;
; In 64-bit long mode you can use 64-bit registers (e.g. rax instead of eax, rbx instead of ebx, etc..)
; Also change "-f elf " for "-f elf64" in build command.
;
        global  main                            ; ‘main’ must be defined, as it being compiled
                                                ; against the C Standard Library
        extern  printf                          ; declares the use of external symbol, as printf
                                                ; printf is declared in a different object-module.
                                                ; The linker resolves this symbol later.

segment .data                                   ; section for initialized data
	string db 'Hello world!', 0Ah, 0            ; message string ending with a newline char (10
                                                ; decimal) and the zero byte ‘NUL’ terminator
                                                ; ‘string’ now refers to the starting address
                                                ; at which 'Hello, World' is stored.

segment .text
main:
        push    string                          ; Push the address of ‘string’ onto the stack.
                                                ; This reduces esp by 4 bytes before storing
                                                ; the 4-byte address ‘string’ into memory at
                                                ; the new esp, the new bottom of the stack.
                                                ; This will be an argument to printf()

        call    printf                          ; calls the C printf() function.
        add     esp, 4                          ; Increases the stack-pointer by 4 to put it back
                                                ; to where it was before the ‘push’, which
                                                ; reduced it by 4 bytes.
        ret                                     ; Return to our caller.

Because the C runtime is used, we define a main() function as the C runtime expects. Instead of calling exit, we simply return from the main function to have the runtime perform the clean-up.

"Hello world!" program for 64-bit mode Linux in NASM style assembly

[edit]

This example is in modern 64-bit mode.

;  build: nasm -f elf64 -F dwarf hello.asm
;  link:  ld -o hello hello.o

DEFAULT REL			    ; use RIP-relative addressing modes by default, so [foo] = [rel foo]

SECTION .rodata			; read-only data should go in the .rodata section on GNU/Linux, like .rdata on Windows
Hello:		db "Hello world!", 10   ; Ending with a byte 10 = newline (ASCII LF)
len_Hello:	equ $-Hello             ; Get NASM to calculate the length as an assembly-time constant
                                    ; the ‘$’ symbol means ‘here’. write() takes a length so that
                                    ; a zero-terminated C-style string isn't needed.
                                    ; It would be for C puts()

SECTION .text

global _start
_start:
	mov eax, 1				; __NR_write syscall number from Linux asm/unistd_64.h (x86_64)
	mov edi, 1				; int fd = STDOUT_FILENO
	lea rsi, [rel Hello]			; x86-64 uses RIP-relative LEA to put static addresses into regs
	mov rdx, len_Hello		; size_t count = len_Hello
	syscall					; write(1, Hello, len_Hello);  call into the kernel to actually do the system call
     ;; return value in RAX.  RCX and R11 are also overwritten by syscall

	mov eax, 60				; __NR_exit call number (x86_64) is stored in register eax.
	xor edi, edi		    ; This zeros edi and also rdi.
                            ; This xor-self trick is the preferred common idiom for zeroing
                            ; a register, and is always by far the fastest method.
                            ; When a 32-bit value is stored into eg edx, the high bits 63:32 are
                            ; automatically zeroed too in every case. This saves you having to set
                            ; the bits with an extra instruction, as this is a case very commonly
                            ; needed, for an entire 64-bit register to be filled with a 32-bit value.
                            ; This sets our routine’s exit status = 0 (exit normally)
	syscall					; _exit(0)

Running it under strace verifies that no extra system calls are made in the process. The printf version would make many more system calls to initialize libc and do dynamic linking. But this is a static executable because we linked using ld without -pie or any shared libraries; the only instructions that run in user-space are the ones you provide.

$ strace ./hello > /dev/null                    # without a redirect, your program's stdout is mixed with strace's logging on stderr.  Which is normally fine
execve("./hello", ["./hello"], 0x7ffc8b0b3570 /* 51 vars */) = 0
write(1, "Hello world!\n", 13)          = 13
exit(0)                                 = ?
+++ exited with 0 +++

Using the flags register

[edit]

Flags are heavily used for comparisons in the x86 architecture. When a comparison is made between two data, the CPU sets the relevant flag or flags. Following this, conditional jump instructions can be used to check the flags and branch to code that should run, e.g.:

	cmp	eax, ebx
	jne	do_something
	; ...
do_something:
	; do something here

Aside, from compare instructions, there are a great many arithmetic and other instructions that set bits in the flags register. Other examples are the instructions sub, test and add and there are many more. Common combinations such as cmp + conditional jump are internally ‘fused’ (‘macro fusion’) into one single micro-instruction (μ-op) and are fast provided the processor can guess which way the conditional jump will go, jump vs continue.

The flags register are also used in the x86 architecture to turn on and off certain features or execution modes. For example, to disable all maskable interrupts, you can use the instruction:

	cli

The flags register can also be directly accessed. The low 8 bits of the flag register can be loaded into ah using the lahf instruction. The entire flags register can also be moved on and off the stack using the instructions pushfd/pushfq, popfd/popfq, int (including into) and iret.

The x87 floating point maths subsystem also has its own independent ‘flags’-type register the fp status word. In the 1990s it was an awkward and slow procedure to access the flag bits in this register, but on modern processors there are ‘compare two floating point values’ instructions that can be used with the normal conditional jump/branch instructions directly without any intervening steps.

Using the instruction pointer register

[edit]

The instruction pointer is called ip in 16-bit mode, eip in 32-bit mode, and rip in 64-bit mode. The instruction pointer register points to the address of the next instruction that the processor will attempt to execute. It cannot be directly accessed in 16-bit or 32-bit mode, but a sequence like the following can be written to put the address of next_line into eax (32-bit code):

	call	next_line
next_line:
	pop	eax

Writing to the instruction pointer is simple — a jmp instruction stores the given target address into the instruction pointer to, so, for example, a sequence like the following will put the contents of rax into rip (64-bit code):

	jmp	rax

In 64-bit mode, instructions can reference data relative to the instruction pointer, so there is less need to copy the value of the instruction pointer to another register.

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
x86 assembly language is a that provides a symbolic representation of the machine instructions executed by processors implementing the x86 (ISA), originally developed by for its 8086 and subsequently extended by both and AMD. It enables direct manipulation of hardware resources such as registers, , and ports, making it essential for tasks requiring fine-grained control over system behavior, including operating system kernels, device drivers, and performance-optimized applications. The language supports multiple addressing modes, a rich set of arithmetic, logical, and control-flow instructions, and has evolved to include extensions like MMX, SSE, and AVX for vector processing and multimedia operations. The x86 architecture traces its origins to the Intel 8086, a 16-bit microprocessor introduced in 1978 that laid the foundation for the personal computer revolution through its use in the IBM PC. Subsequent processors, such as the 80286 in 1982, introduced protected mode for enhanced memory management and multitasking capabilities, while the 80386 in 1985 extended the architecture to 32-bit operations with virtual memory support. The shift to 64-bit computing came in 2003 when AMD launched the Opteron processor family, introducing the AMD64 extension (also known as x86-64) that added 64-bit registers and addressing while maintaining backward compatibility with 32-bit and 16-bit code. Intel adopted this extension as Intel 64 (formerly EM64T) starting with its Nocona-based Xeon processors in 2004, solidifying x86-64 as the dominant mode for modern computing. Key features of x86 assembly include its segmented memory model in real mode, flat memory model in protected and long modes, and a variety of general-purpose registers (e.g., EAX, EBX in 32-bit; RAX, RBX in 64-bit) alongside specialized ones for floating-point (x87 FPU) and vector operations (XMM, YMM, ZMM). Processors operate in several modes—real mode for legacy 16-bit compatibility, protected mode for 32-bit multitasking with privilege levels, and long mode for 64-bit execution—allowing flexible transitions during boot and runtime. Assembly code can be written in Intel syntax, which is mnemonic-based and source-destination ordered (e.g., mov eax, ebx), as used in official Intel documentation, or AT&T syntax, common in Unix-like systems and GAS (e.g., movl %ebx, %eax), which prefixes operands with sizes and uses percent signs for registers. Despite its complexity due to backward compatibility and irregular instruction encodings, x86 assembly remains vital for embedded systems, reverse engineering, and high-performance computing where higher-level languages fall short.

Overview

History and Evolution

The x86 assembly language originated with the microprocessor, introduced in 1978 as a 16-bit complex instruction set computing (CISC) architecture designed to support advanced applications and serve as a template for future processors. Developed in just 18 months, the 8086 featured microcode implementation and became the foundation for the x86 family, powering the PC released in 1981, which used the closely related 8088 variant and established widespread software and hardware compatibility standards. This integration into the IBM PC ecosystem ensured the persistence of x86 despite the rise of reduced instruction set computing (RISC) alternatives, as backward compatibility drove industry adoption and locked in a vast software base. The architecture evolved significantly with the in 1982, which introduced to enable multitasking and , enhancing system reliability for emerging multi-user environments. This was followed by the Intel 80386 in 1985, marking the shift to 32-bit processing with support for and a flat memory model, allowing larger address spaces and improved efficiency for operating systems like Windows. The series, launched in 1993, advanced the design with superscalar execution for parallel instruction processing, dropping the "86" suffix while maintaining compatibility to sustain the PC market's growth. A pivotal extension occurred in 2003 with the introduction of 64-bit addressing via AMD's AMD64 architecture, which Intel adopted as Intel 64 in 2004, enabling larger memory capacities and enhanced performance for data-intensive applications without breaking legacy support. Key instruction set extensions further propelled x86's relevance: MMX in 1996 added multimedia acceleration to the Pentium MMX; SSE in 1999 with Pentium III introduced SIMD for vector operations; AVX in 2011 expanded vector widths to 256 bits for high-performance computing; and AVX-512 in 2016 provided 512-bit vectors optimized for AI and machine learning workloads. In 2023, Intel announced AVX10, a converged instruction set incorporating AVX-512 features to simplify implementation across processors. These developments maintained x86's dominance by balancing innovation with the enduring IBM PC compatibility legacy.

Key Characteristics and Usage

x86 assembly language is rooted in the Complex Instruction Set Computing (CISC) architecture, which supports a diverse array of instructions designed to perform complex operations in a single command, contrasting with the simpler, fixed-length instructions typical of Reduced Instruction Set Computing (RISC) designs. This CISC approach enables x86 instructions to vary in length from 1 to 15 bytes, allowing for flexible encoding that optimizes for both common and specialized tasks while maintaining high code density. A hallmark of the x86 architecture is its strong emphasis on , supporting execution in 16-bit, 32-bit, and 64-bit modes through mechanisms like in , which permits unmodified legacy applications to run alongside modern 64-bit software without requiring emulation. In practice, x86 assembly is primarily employed in domains demanding precise control and efficiency, such as kernel development where it facilitates low-level system calls and handling, device drivers for direct hardware interaction, and embedded systems constrained by resource limitations. It also plays a key role in performance-critical applications like game engines, where optimized routines enhance rendering and physics simulations, and in compiler optimization through inline assembly embedded in higher-level languages like C/C++ to bypass generated code inefficiencies. Despite these strengths, x86 assembly presents challenges due to its inherent , including variable instruction lengths and intricate addressing modes that can lead to programming errors and difficult . However, it offers significant advantages in code density, reducing program size compared to equivalent RISC implementations, and provides unparalleled direct hardware control, enabling fine-tuned access to CPU registers, memory, and peripherals for maximal performance. As of 2025, x86 remains the dominant architecture in desktops and servers, holding the majority powered by and processors. Its relevance persists in security research, where assembly-level uncovers vulnerabilities in low-level , and in just-in-time () compilers for engines like V8 and , which generate optimized x86 to accelerate web applications while posing novel attack surfaces studied in defenses against JIT spraying and exploits.

Syntax and Notation

Syntax Variants

x86 assembly language supports multiple syntax variants, each tailored to different assemblers and development environments, primarily differing in operand ordering, notation for registers and memory, and directive usage. The most prominent variants are Intel syntax, used by assemblers like Microsoft's MASM, and AT&T syntax, employed by the GNU Assembler (GAS). Intel syntax, as implemented in MASM, places the destination before the source (e.g., mov rax, rbx), aligning with the conventional reading of instructions from left to right. Registers are denoted without prefixes (e.g., rax), memory addresses use square brackets (e.g., [rcx + r10 * 2 + 100h]), and data sizes are specified via qualifiers like DWORD PTR when ambiguous (e.g., mov eax, DWORD PTR [ecx]). Directives include .data for initialized data sections and .code for executable code, with EQU for defining constants (e.g., myvar EQU 100). Comments begin with a (;). This syntax is prevalent in Windows development tools due to its integration with ecosystems. In contrast, syntax in GAS reverses the operand order, placing sources before destinations (e.g., movl %esi, %ebx), and requires explicit size suffixes on mnemonics (e.g., movl for 32-bit, movb for 8-bit). Registers are prefixed with % (e.g., %eax), immediates with $ (e.g., movb $10, %al), and memory operands use parentheses with an offset-base format (e.g., 4(%esp)). Directives such as .data and .text organize sections, and comments start with #. This variant originated from Unix systems and emphasizes explicitness to avoid ambiguity in operand types. Other assemblers introduce portable or specialized variants of Intel syntax. NASM employs a clean, portable Intel-like syntax with destination-first ordering (e.g., mov eax, ebx), mandatory square brackets for memory (e.g., [ebx + esi * 4 + 10]), and no register prefixes. It uses section .data and section .text for segments, EQU for constants (e.g., MAX EQU 100), and ; for comments. NASM's design prioritizes cross-platform compatibility and modularity. FASM adopts a flat-model-focused Intel syntax, also destination-first (e.g., mov eax, [ebx]), with square brackets for memory and size operators like dword (e.g., mov eax, dword [100h]). Equates use = (e.g., x = 1), sections are defined via section directives similar to NASM, and comments use ;. FASM emphasizes optimization and , supporting multiple passes for code size reduction without high-level MASM constructs like PROC. Converting between these variants presents challenges, such as reversing operand orders, adding/removing prefixes like % for registers in , adjusting memory notation from parentheses to brackets, and harmonizing directives (e.g., .data vs. section .data). Tools like syntax converters or manual rewriting are often required, as automated translation can introduce errors in complex addressing or macros.

assembly

; Example in Intel/MASM syntax .data msg db "Hello", 0 .code mov rax, offset msg ; Destination first, no % prefix

; Example in Intel/MASM syntax .data msg db "Hello", 0 .code mov rax, offset msg ; Destination first, no % prefix

assembly

# Example in AT&T/GAS syntax .data msg: .ascii "Hello\0" .text movq $msg, %rax ; Source first, % prefix, $ for immediate[](https://cs61.seas.harvard.edu/site/2018/Asm1/)

# Example in AT&T/GAS syntax .data msg: .ascii "Hello\0" .text movq $msg, %rax ; Source first, % prefix, $ for immediate[](https://cs61.seas.harvard.edu/site/2018/Asm1/)

assembly

; Example in NASM syntax section .data msg db 'Hello', 0 section .text mov rax, msg ; Square brackets for memory if needed

; Example in NASM syntax section .data msg db 'Hello', 0 section .text mov rax, msg ; Square brackets for memory if needed

assembly

; Example in FASM syntax section .data msg db 'Hello',0 section .code mov rax, msg ; = for equates, flat model

; Example in FASM syntax section .data msg db 'Hello',0 section .code mov rax, msg ; = for equates, flat model

Mnemonics and Opcodes

In x86 assembly language, mnemonics serve as human-readable symbolic representations of instructions, such as MOV for data movement or ADD for arithmetic , which directly correspond to specific binary opcodes executed by the processor. These opcodes are fixed binary values that define the operation, with examples including 0x89 for MOV from register to register or 0x01 for ADD from register to . The mapping ensures that assemblers translate mnemonic-based source code into the processor's native binary format, maintaining compatibility across Intel 64 and architectures. x86 instructions employ a variable-length encoding scheme, typically ranging from 1 to 15 bytes, comprising optional prefixes, one or more bytes, a byte (if required for specification), an optional Scale-Index-Base (SIB) byte, displacement fields, and immediate data. The byte, an 8-bit field, encodes addressing modes and selection using three subfields: Mod (2 bits for mode), Reg/Opcode (3 bits for register or extension), and R/M (3 bits for register or memory base). This flexible structure allows efficient encoding of diverse types, from register-to-register operations to complex memory accesses. Opcode organization relies on hierarchical tables: primary opcodes use a single byte (e.g., 0x00 to 0xFF for basic operations like ADD), while secondary opcodes extend via a two-byte escape prefix such as 0x0F (e.g., 0F 01 for system instructions). Further extensions include three-byte formats like 0F 38 or 0F 3A for advanced instructions (e.g., 0F 38 01 for packed horizontal addition). Modern extensions differentiate legacy encodings from enhanced ones; for instance, the REX prefix (0x40 to 0x4F) in 64-bit mode extends operand sizes, adds high registers (R8-R15), and enables RIP-relative addressing. Similarly, the (2- or 3-byte forms starting with 0xC4 or 0xC5) supports AVX vector instructions by embedding legacy prefixes and specifying vector lengths. Prefixes modify instruction behavior and contribute to variable length: the LOCK prefix (0xF0) ensures atomic operations on for , while REP (0xF3) or REPNE (0xF2) repeats operations until a condition is met. These elements allow instructions to adapt dynamically, such as a simple MOV r32, imm32 expanding to 5 bytes with B8 plus the immediate value. Vendor-specific extensions introduce additional opcode spaces; AMD's 3DNow! uses a secondary of 0x0F 0x0F followed by a byte and an 8-bit immediate (imm8) to SIMD floating-point operations, such as 0F 0F /r 9E for packed floating-point (PFADD). This format reserves the imm8 for up to 256 unique operations, distinguishing it from Intel's SSE/AVX paths, though now recommends migrating to standard vector extensions for broader compatibility. Disassembly tools like from the GNU Binutils suite reverse this process, displaying both opcodes and corresponding mnemonics from object files or executables, as in objdump -d binary outputting lines like 89 c3: mov %eax,%ebx alongside the raw bytes. This aids in verifying encodings and low-level code.

Reserved Words and Directives

In x86 assembly language, reserved words encompass identifiers that the assembler treats as fixed and cannot be redefined by the programmer, including register names and certain symbols, to prevent conflicts with the processor's . These reservations ensure consistent interpretation during assembly, as redefining them can lead to errors or unexpected behavior, such as failed compilation when attempting to use a register name as a variable. Register names like EAX, ESP, and their variants (e.g., AH, AL, AX for 8-bit and 16-bit portions) are prime examples of reserved words across assemblers, as they directly map to hardware registers and cannot be reassigned without triggering assembly errors. In (MASM), the full list includes EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, and segment registers like CS, DS, SS, ES, FS, GS, all of which are protected under all CPU modes to maintain compatibility. Similarly, in the (NASM), registers such as RAX (in 64-bit mode) and their low/high byte variants are reserved, with legacy high bytes like AH inaccessible in certain 64-bit contexts via the REX prefix. Misuse, such as redefining EAX as a , results in immediate assembly failure, emphasizing the need for programmers to avoid keyword conflicts. Directives, also known as assembler pseudo-instructions, are non-executable commands that guide the assembly process, such as defining , managing layout, or structuring code, and they vary slightly between assemblers like MASM and NASM. For data definition, common directives include DB (define byte), DW (define word, 2 bytes), and DD (define doubleword, 4 bytes), which allocate and initialize with specified values; for example, DB 42 reserves one byte with the value 42, while DD 0x12345678 reserves four bytes for a 32-bit . These are universal in x86 assemblers and essential for embedding constants or arrays without runtime overhead. Segment and layout directives control how and are organized in the output file. In MASM, SEGMENT (or SECTION) defines a memory segment, such as .DATA SEGMENT to group variables, and ASSUME specifies register-segment associations, like ASSUME DS:[DATA](/page/Data), to inform the assembler of addressing assumptions for optimization. NASM uses SECTION (or SEGMENT) similarly to switch between sections like .text for or .bss for uninitialized , with ORG setting the absolute origin address in flat binary outputs, e.g., ORG 0x1000 to start at a specific offset. The INCLUDE directive, supported in both, incorporates external source files, e.g., INCLUDE "macros.inc", to modularize assembly. Improper use, such as mismatched ASSUME declarations, can cause linker errors or incorrect memory references during execution. Program structure directives mark the boundaries of code units. In MASM, PROC declares a procedure, e.g., main PROC, paired with ENDP to close it, enabling with local labels, while END signals the program's termination and optionally specifies an like END main. NASM lacks native PROC/ENDP but uses %define for macro definitions, e.g., %define MAX 100, which acts as a text substitution for constants or simple macros without procedure semantics. These directives ensure proper scoping; for instance, omitting ENDP in MASM leads to unresolved symbol errors at assembly time. Assembler-specific variations, such as MASM's DUP for repeating data definitions (e.g., array DW 10 DUP(0)), highlight the need to consult variant-specific documentation to avoid portability issues.

Processor Architecture

Registers

The x86 architecture features a diverse set of registers that form the core of its , enabling efficient data manipulation, memory addressing, and control of processor state across various operating modes. These registers have evolved from the original 16-bit design of the to support 32-bit and 64-bit extensions, with additional specialized registers introduced through SIMD and other enhancements. The general-purpose, segment, control, and debug registers provide the foundational hardware for assembly programming, while the captures execution status for conditional operations. The x87 (FPU) includes eight 80-bit floating-point registers organized as a stack (ST0 through ST7), along with control (FCW), status (FSW), tag (FTW), instruction pointer (FIP), data pointer (FDP), and opcode (FOp) registers for managing floating-point operations and exceptions. General-purpose registers (GPRs) serve as the primary storage for operands, addresses, and computation results in x86 assembly. In the original 16-bit IA-32 architecture, there are eight 16-bit GPRs: AX, BX, CX, DX, SI, DI, BP, and SP, each of which can be accessed via 8-bit sub-registers for the high and low bytes (e.g., AH and AL for AX). These were extended to 32-bit registers in the 80386 processor (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP), allowing larger data handling while maintaining backward compatibility through the lower 16- and 8-bit portions. In 64-bit mode (Intel 64), these expand to 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP) plus eight additional ones (R8 through R15), requiring the REX prefix for access to the new registers and full 64-bit widths; all GPRs support byte-level subregister access (e.g., AL, R8B), though the REX prefix is required for certain subregisters like SPL, BPL, SIL, DIL and the new registers R8B–R15B. The ESP/RSP register specifically functions as the stack pointer, while EBP/RBP acts as the base pointer for stack frames.
Register Group16-bit32-bit64-bitKey Roles
AccumulatorAXEAXRAXArithmetic, I/O operations
BaseBXEBXBase addressing, data storage
CounterCXECXRCXLoop counters, shifts
DataDXEDXRDXI/O port addressing, multiplication/division
Source IndexSIESIRSIString source addressing
Destination IndexDIEDIRDIString destination addressing
Base PointerBPEBPRBPStack frame base
Stack PointerSPESPRSPStack top management
Additional (64-bit only)--R8–R15General data and addressing
Segment registers manage in real and protected modes, defining the base addresses and attributes for different memory regions. There are six 16-bit segment registers: CS (code segment), DS (data segment), ES (extra segment), FS and GS (general-purpose segments, often used for in modern systems), and SS (stack segment). These registers hold selectors that index into the (GDT) or Local Descriptor Table (LDT) to compute segment bases, limits, and access rights; in 64-bit mode, segmentation is largely flat, with CS, DS, ES, and SS becoming non-segmented and FS/GS retaining base functionality via model-specific registers. The instruction pointer (EIP in 32-bit mode, RIP in 64-bit mode) holds the address of the next instruction to execute, facilitating sequential and branched program flow. The , EFLAGS in 32-bit mode and RFLAGS in 64-bit mode (with upper 32 bits reserved), is a 32-bit (or 64-bit) register that stores processor status and control information. Key status flags include the (ZF, bit 6) set when the result of an operation is zero, the (CF, bit 0) indicating carry or borrow in arithmetic, and the (OF, bit 11) detecting signed arithmetic overflow; these flags influence conditional jumps and other control-flow instructions. Additional bits manage interrupts (IF, bit 9), direction for string operations (DF, bit 10), and other modes. Control registers oversee operating modes, memory management, and extensions. CR0 (32-bit) controls basic features like protected mode enablement (PE bit 0) and numeric error handling; CR3 holds the physical base address of the page directory for virtual memory paging; CR4 extends controls for features like SIMD exception handling (OSXMMEXPT bit). Debug registers (DR0–DR3 for 32/64-bit breakpoint addresses, DR6 for status, DR7 for control) support hardware breakpoints and watchpoints for debugging. The x86 register set has evolved with SIMD extensions to support vector processing. The MMX extension (1997) introduced eight 64-bit MMX registers (MM0–MM7) aliased to the FPU stack for packed integers. SSE (1999) added 128-bit XMM registers (XMM0–XMM7, extended to 16 in 64-bit mode), while AVX (2011) introduced 256-bit YMM registers (YMM0–YMM7 in 32-bit mode, YMM0–YMM15 in 64-bit mode) and AVX-512 (2016) added 512-bit ZMM registers (ZMM0–ZMM7 in 32-bit mode, ZMM0–ZMM31 in 64-bit mode), enabling wider parallel operations on floating-point and integer data across multiple lanes. These extensions significantly expand the register file for high-performance computing without altering the core GPRs.

Memory Addressing

In x86 assembly language, memory addressing modes determine how operands are specified for instructions, allowing access to registers, immediate values, or locations. These modes provide flexibility in forming effective addresses, which are computed as offsets within or linear addresses in flat models. The primary modes include immediate, register, direct, register indirect, and more complex forms combining base registers, indices, scales, and displacements. Immediate addressing embeds a constant value directly in the instruction, used for operations like loading a literal into a register. For example, mov eax, 5 places the value 5 into the EAX register without referencing . Register addressing operates solely on processor registers, such as mov eax, ebx, which copies the contents of EBX to EAX. These modes are efficient as they avoid memory access. Direct addressing specifies a fixed in the instruction, as in mov eax, [100h], where the contents at address 100h are loaded into EAX. Register indirect addressing uses a register to hold the , for instance mov eax, [ebx], dereferencing the value in EBX as the . Unlike some architectures like , x86 does not support automatic pre- or post-increment in these indirect modes; increments require separate instructions such as INC. The most versatile mode is the base-plus-index-plus-scale-plus-displacement form, which computes the effective address as base register + (index register × scale) + displacement. Here, the base and index are general-purpose registers (e.g., EBX and ESI), the scale is 1, 2, 4, or 8 for array access, and the displacement is an optional constant. An example is mov eax, [ebx + esi*4 + 10h], useful for traversing data structures like arrays. In 64-bit mode, this mode supports 64-bit registers but limits displacements to 32 bits, sign-extended during calculation. RIP-relative addressing, available only in 64-bit mode, forms addresses relative to the instruction pointer (RIP) plus a 32-bit signed displacement, enabling position-independent code without absolute addresses. For example, mov eax, [rip + offset] loads from a location offset from the current instruction. This mode enhances portability in shared libraries. When operand sizes are ambiguous, especially for memory references, explicit size specifiers disambiguate the instruction. Directives like BYTE PTR for 8-bit, WORD PTR for 16-bit, or DWORD PTR for 32-bit ensure correct interpretation, as in mov byte ptr [esi], 5. Failure to specify can lead to assembler errors or unintended sizes. x86 supports both flat and segmented addressing models. In the flat model, prevalent in 64-bit mode, addresses are linear without segment bases (defaults to zero), simplifying access to a continuous . Segmented addressing, used in IA-32 real or protected modes, combines segment selectors with offsets but is detailed separately; the addressing modes here form the offset component in both cases.

Segmented Memory Model

The segmented memory model in divides the memory address space into variable-sized segments to facilitate addressing beyond the limitations of early processors. In , which is the default execution mode upon processor reset and emulates the 8086 environment, memory addressing employs a 20-bit space calculated using a segment:offset pair. The segment register, such as CS for code or DS for data, holds a 16-bit value that is shifted left by 4 bits (multiplied by 16) and added to a 16-bit offset to yield the effective address, allowing access to up to 1 MB of while each segment is limited to 64 KB. For instance, the instruction pointer IP combined with the CS forms the as CS * 16 + IP. In , introduced with the and expanded in subsequent processors, the segmented model evolves to support , larger address spaces, and multitasking through descriptor tables. The (GDT) provides system-wide segment definitions, while the Local Descriptor Table (LDT) allows task-specific segments, both loaded into memory and referenced by the GDTR and LDTR registers, respectively. Each is an 8-byte structure containing a base address (up to 4 GB in 32-bit mode), a limit defining the segment size (expandable via granularity bits to 4 GB), and access rights including privilege levels (0-3 for ring protection), type (code, data, stack), and attributes like readability or writability. Segment registers in protected mode hold 16-bit selectors that index into the GDT or LDT to retrieve the corresponding descriptor, enabling dynamic segment relocation and protection checks. A selector comprises a 13-bit index, a 1-bit Table Indicator (TI) to distinguish GDT (TI=0) from LDT (TI=1), and a 2-bit Requestor Privilege Level (RPL) for access validation against the descriptor's privilege. Upon loading a selector, the processor uses the descriptor's base and limit to compute the linear address as base + offset, with violations triggering exceptions like general-protection (#GP) for out-of-limit accesses or privilege mismatches. In 32-bit and 64-bit modes, typically adopt a flat memory model that minimizes segmentation's complexity by using a single, continuous spanning 0 to 4 GB in 32-bit or 0 to 2^64 bytes in 64-bit . This is achieved by configuring segment descriptors with a base address of 0 and a limit of 4 GB (or unlimited in 64-bit), effectively ignoring segmentation for most operations while retaining the mechanism for compatibility. Exceptions include the FS and GS segments, which can have non-zero bases to support (TLS) and other OS-specific uses without altering the flat addressing for code, data, and stack. The segmented model's legacy from introduces challenges, such as wraparound behavior where offsets exceeding 64 KB modulo back to 0, potentially causing unintended overlaps between segments and complicating legacy code porting. These issues persist for with 8086 software, requiring careful handling in emulators or mode transitions to avoid faults like invalid memory accesses.

Operating Modes

Real Mode

Real mode, also known as real-address mode, is the default operating mode for x86 processors upon or boot, providing with the original architecture. In this environment, the processor uses a segmented model with 16-bit segment registers and 16-bit offsets to form 20-bit es, limiting the addressable space to 1 MB (from 0x00000 to 0xFFFFF). The is calculated by shifting the 16-bit segment value left by 4 bits (multiplying by 16) and adding the 16-bit offset, with no mechanisms in place, allowing unrestricted access to the full at privilege level 0. Interrupt handling in real mode relies on the (IVT), a fixed structure located at 0000:0000 (the first 1 KB of memory), containing 256 four-byte entries that point to interrupt service routines. This setup enables direct invocation of and DOS services through software interrupts, as seen in traditional programming where applications interact with hardware via standardized interrupt vectors such as INT 21h for DOS functions and for disk operations. Real mode imposes several key limitations suited to early 16-bit systems. It supports no native multitasking, as there are no privilege rings or task switching mechanisms, and all code executes with equal access to and hardware ports. Segments are capped at 64 KB in size and must align on 16-byte boundaries, restricting code and data blocks while permitting direct I/O operations without mediation, which facilitates low-level hardware control but risks system instability. In contemporary systems, persists primarily for compatibility in bootloaders, such as the initial stage of on x86 platforms, where it loads the core image and modules before transitioning to higher modes. It also remains relevant for embedded applications running under legacy environments, enabling direct hardware manipulation in resource-constrained settings like industrial controllers or vintage software emulation. To exit real mode and enter , software must first initialize a (GDT) and then execute the LMSW (Load Machine Status Word) instruction to set the Protection Enable (PE) bit in the CR0 register, enabling and expanded addressing.

Protected Mode

is a 32-bit operational mode of the x86 architecture introduced with the 80386 processor, enabling advanced , protection mechanisms, and support for multitasking. It is activated from by setting the Protection Enable (PE) bit (bit 0) in the CR0 using a MOV CR0 instruction, followed by a far jump or intersegment return to load a selector from the (GDT). The GDT, loaded into the GDTR register via the LGDT instruction, contains segment descriptors that define up to 4 GB of linear through base addresses, limits (up to 4 GB per segment with granularity extensions), and access rights. This segmentation allows logical addresses (segment selector + offset) to be translated into linear addresses, providing a foundation for protected execution. A key feature of protected mode is its hierarchical protection rings, which enforce privilege levels to isolate code execution and prevent unauthorized access to system resources. There are four rings (0 to 3), with Ring 0 designated for the most privileged kernel-mode code and Ring 3 for least-privileged user-mode applications. The Current Privilege Level (CPL), encoded in bits 0-1 of the CS and SS segment registers, determines the executing ring, while the Descriptor Privilege Level (DPL) in segment descriptors and the Requested Privilege Level (RPL) in selectors govern access checks. Privilege transitions, such as from Ring 3 to Ring 0, are controlled through mechanisms like call gates, interrupt gates, and task gates, which validate levels before allowing sensitive operations like system calls. Virtual memory in protected mode is implemented via paging, which maps linear addresses to physical addresses for abstraction and isolation. Paging is enabled by setting the PG bit (bit 31) in CR0, after which the CR3 register points to the base of a page directory containing 1024 entries, each referencing a page table with another 1024 entries for 4 KB pages. A linear address is divided into three parts: a directory index (bits 31-22), a table index (bits 21-12), and a page offset (bits 11-0), enabling up to 4 GB of virtual address space per process. The Translation Lookaside Buffer (TLB), a hardware cache, stores recent address translations to accelerate paging operations and reduce latency. Multitasking support in relies on the Task State Segment (TSS) for context switching between tasks and the (IDT) for handling interrupts. The TSS, described in the GDT or LDT as a system segment, stores the complete state of a task, including general-purpose registers, segment registers, and stack pointers for each privilege level (0-2), and is loaded into the task register via the LTR instruction. Task switches occur via the CALL, JMP, IRET, or exception/ mechanisms, saving the current task state to its TSS and loading the new one. The IDT, loaded via LIDT into the IDTR register, contains up to 256 interrupt vectors, each as a gate descriptor (task, , or trap gate) that directs control to handlers, often in Ring 0, with privilege checks enforced. In practice, operating systems such as Windows and utilize with a flat memory model, where segment registers are set to cover the entire 4 GB linear (base 0, limit 4 GB), minimizing segmentation overhead while relying on paging for and management. This approach allows each process to have its own page directory for isolated virtual s, enabling secure multitasking without complex segment usage.

Long Mode

Long Mode, also known as 64-bit mode within the x86-64 architecture, represents the core extension introduced by AMD to enable full 64-bit processing on x86 processors, first implemented in the AMD Opteron in 2003. This mode expands the address space to 64-bit virtual addresses, though current implementations use 48-bit effective addressing with higher bits sign-extended for canonical form, allowing access to up to 256 terabytes of virtual memory per process. General-purpose registers are widened to 64 bits (e.g., RAX, RBX), and eight additional 64-bit registers (R8 through R15) are provided to support more efficient 64-bit computation without legacy 32-bit constraints. RIP-relative addressing further enhances this mode by permitting memory operands to be offset from the instruction pointer (RIP), facilitating position-independent code commonly used in modern shared libraries and executables. Long Mode operates in two sub-modes to balance new capabilities with legacy support: 64-bit mode for native execution of 64-bit instructions and applications, and , which allows unmodified 32-bit and 16-bit protected-mode code to run under a 64-bit operating system by emulating the protected-mode environment (e.g., default size of 32 bits or 16 bits). addressing enforces validity by requiring all virtual addresses to lie within the signed range from -247 to 247 - 1, where bits 63 through 48 must replicate the sign of bit 47; non-canonical addresses trigger general-protection faults to prevent invalid memory access. In 64-bit mode, the segmented memory model is simplified to a flat , with most segment registers (CS, DS, ES, SS) ignored and treated as having base address 0 and limit 264 - 1, eliminating the need for segment descriptors in user code. Exceptions are the FS and GS segments, which remain functional for and can specify 64-bit base addresses loaded via model-specific registers such as FS_BASE (MSR C000_0100h) and GS_BASE (MSR C000_0101h). Paging is required for all operations in and mandates the use of Extensions (PAE), employing four-level page tables to map 48-bit virtual addresses or optional five-level page tables (supported since 2017 in processors and widely adopted by 2025) to map 57-bit virtual addresses, to up to 52-bit physical addresses, with support for 4 KB, 2 MB, and 1 GB page sizes. Adoption of accelerated with major operating systems: the introduced x86-64 support in version 2.6.0, released on December 17, 2003, enabling widespread use in distributions by 2004. followed with , released on April 25, 2005, marking the first consumer x86-64 version of Windows and building on earlier server editions from 2003.

Mode Transitions

Mode transitions in x86 assembly language involve precise sequences of instructions to switch between operating modes, ensuring compatibility with the processor's state and avoiding exceptions. These transitions are critical for bootloaders and operating system kernels, as they enable access to advanced features like protected memory and 64-bit addressing while maintaining backward compatibility. The process typically requires configuring control registers, loading descriptor tables, and executing jumps to update the processor's execution environment. The transition from real mode to protected mode begins with enabling the A20 address line to access memory above 1 MB, followed by loading the Global Descriptor Table (GDT) using the LGDT instruction to specify its base address and limit. Interrupts are disabled (CLI) to prevent interference, and the protection enable (PE) bit in CR0 is set to 1 via MOV CR0, eax (with the appropriate value in EAX). A far jump (JMP FAR) or intersegment return (IRET) is then executed to load a valid 32-bit code segment selector into CS from the GDT, flushing the instruction prefetch queue and switching the processor to protected mode. Finally, other segment registers (DS, SS, ES, FS, GS) are loaded with appropriate selectors, and the Interrupt Descriptor Table (IDT) is loaded using LIDT. This sequence allows the use of segmented memory and privilege levels. Switching from protected mode to long mode (IA-32e mode) requires first enabling (PAE) by setting the PAE bit in CR4 to 1. The CR3 register is loaded with the of the Page Directory Pointer Table (PDPT), which contains pointers to page directories for 64-bit paging. The long mode enable (LME) bit in the Extended Feature Enable Register (EFER) is set to 1 using a write. Paging is then enabled by setting the PG bit in CR0 to 1, and a far jump is performed to a 64-bit selector (with the L bit set in the GDT descriptor) to enter 64-bit submode. These steps establish four-level paging and RIP-relative addressing. Transitioning from 64-bit mode to 32-bit within occurs by loading a descriptor with the L bit cleared (indicating 32-bit operation) via a far return (RETF) or return (IRET) instruction, using a selector from the GDT or LDT that points to a compatibility-mode . Alternatively, the SYSCALL instruction can invoke a 32-bit handler if configured. This allows legacy 32-bit code to execute without leaving , preserving the paging and segment structures. Invalid mode transitions can trigger exceptions, such as a general-protection fault (#GP) from malformed GDT entries or a (#PF) from invalid paging setups, potentially escalating to a double fault (#DF) if the handler fails. A results when the double-fault handler itself causes an exception (e.g., due to an invalid entry or ), leading to a processor shutdown and reset with no software recovery possible. In real-mode transitions, failing to enable the risks address wraparound, corrupting data access above 1 MB. Initial mode handling is managed by : traditional initializes the processor in , loading the at 0x7C00 and requiring intervention for transitions. firmware, in contrast, operates in from the start on systems, providing a PE/COFF loader for applications and handling initial paging and descriptor setup before transferring control.

Instruction Set

Data Movement Instructions

Data movement instructions in x86 assembly language facilitate the transfer of data between registers, locations, and immediate values, forming the foundation for data manipulation without performing arithmetic or logical operations. These instructions support various sizes, including bytes, words (16 bits), doublewords (32 bits), and quadwords (64 bits) in 64-bit mode, and adhere to the processor's addressing modes for efficient access. They are essential for initializing variables, passing parameters, and managing data flow in programs, with operations typically not affecting the processor's flags unless specified otherwise. The MOV instruction performs a general-purpose transfer, copying the contents of the source to the destination while leaving the source unchanged. It supports transfers between registers (e.g., MOV EAX, EBX), from memory to registers or vice versa (e.g., MOV EAX, [EBX]), and from immediate values to registers or memory (e.g., MOV EAX, 42), but does not allow immediate-to-immediate or segment register as a source in register-to-segment transfers. In 64-bit mode, MOV operates on 64-bit registers like RAX, and it does not affect any flags. For example, the assembly code MOV ECX, [EAX + 4] loads a 32-bit value from the EAX + 4 into ECX, leveraging scaled-index addressing modes. MOV ensures during transfers and can be prefixed with LOCK for atomicity in multiprocessor environments when accessing memory. PUSH and POP instructions handle stack-based data movement, automatically adjusting the stack pointer (ESP in 32-bit mode or RSP in 64-bit mode) to push or pop values onto or from the stack. PUSH decrements the stack pointer by the operand size (e.g., 8 bytes for quadwords in 64-bit mode) and stores the source (register, , or immediate) at the new top of the stack, as in PUSH EAX, which saves the value of EAX before a subroutine call. Conversely, POP loads the value from the top of the stack into the destination (register or ) and increments the stack pointer, restoring the saved value with POP EAX after the subroutine returns. These instructions do not affect flags and are crucial for function calls, local variable allocation, and handling, with PUSH supporting immediate values up to 32 bits even in 64-bit mode. In stack overflow scenarios, they rely on the operating system's stack limits for protection. The XCHG instruction exchanges the contents of two operands atomically, swapping a register with another register or with a , which is particularly useful for implementing locks in multithreaded applications. For instance, XCHG EAX, EBX interchanges the values in EAX and EBX, while XCHG EAX, [MEM] swaps EAX with the at MEM. It supports byte, word, doubleword, or quadword sizes, with the LOCK prefix ensuring atomic operation on operands in multiprocessor systems by preventing other processors from reading or writing the during the exchange. XCHG does not affect flags and requires at least one to be a register, making it efficient for operations without additional primitives. In 64-bit mode, it operates on 64-bit registers like RAX. LEA (Load Effective Address) computes the effective address of a and stores it in a register without accessing the itself, enabling efficient arithmetic such as scaling and indexing. An example is LEA EAX, [EBX + 4*ECX], which calculates the EBX + 4*ECX and loads it into EAX, useful for pointer manipulation or array indexing. It supports all addressing modes, including displacement, base, index, and scale, but treats the as an expression rather than dereferencing it. LEA does not affect flags and is available in 32-bit and 64-bit modes, where it can produce 64-bit addresses in registers like RAX. This instruction optimizes code by combining multiple ADD operations into a single instruction, though it cannot load segment registers. String movement instructions like MOVS and LODS enable efficient block transfers of data using dedicated index registers (ESI/RSI for source and EDI/RDI for destination in 64-bit mode), with the direction determined by the DF (Direction Flag) in the EFLAGS register. MOVS copies a byte, word, doubleword, or quadword from the source string (at [RSI]) to the destination string (at [RDI]), then auto-increments or decrements the pointers based on DF (forward if DF=0, backward if DF=1), as in MOVS DWORD PTR [EDI], DWORD PTR [ESI]. The REP prefix repeats the operation ECX/RCX times, decrementing the counter until zero, making it ideal for memcpy-like operations on large buffers. Similarly, LODS loads a string element from [RSI] into AL/AX/EAX/RAX and updates RSI, with REP LODS loading sequential elements into the accumulator for processing. These instructions support byte-level alignment and can be combined with segment overrides, but require explicit size prefixes (e.g., BYTE PTR) for clarity; in 64-bit mode, they handle up to quadwords with 64-bit indices. They do not affect arithmetic flags, focusing purely on data relocation.

Arithmetic and Logic Instructions

The arithmetic and logic instructions in x86 assembly language form the core of computations performed by the (ALU), operating on registers, , or immediate values while updating status flags in the EFLAGS register to indicate results such as zero, sign, carry, and overflow. These instructions support both unsigned and signed operations, with flag updates enabling conditional branching for error handling and flow control. Unlike data movement instructions, which merely transfer values, arithmetic and logic operations modify operands to produce new results, often with multi-byte handling for . Addition instructions include ADD, which adds the source operand to the destination operand and stores the result in the destination, setting the carry flag (CF) if there is a carry out of the most significant bit and the overflow flag (OF) for signed overflow. The ADC variant extends this by adding the carry flag from a previous operation, facilitating multi-precision arithmetic; for example, in 32-bit mode, ADC EAX, EBX adds EBX and CF to EAX, updating flags including auxiliary carry (AF) for BCD arithmetic. Both instructions affect parity (PF), sign (SF), and zero (ZF) flags based on the result, with operands sized from 8 to 64 bits depending on mode and prefixes. Subtraction mirrors addition with SUB, subtracting the source from the destination and storing the result in the destination, setting CF for borrow and OF for signed underflow. The SBB form subtracts the source and CF (as borrow) from the destination, essential for chained subtractions; for instance, SBB EAX, EBX computes EAX - EBX - CF, preserving flags for subsequent operations in multi-word subtraction. These instructions clear no flags inherently but set them according to the arithmetic outcome, supporting atomic operations via the LOCK prefix in . Multiplication instructions handle unsigned and signed integers using the accumulator registers. MUL performs unsigned multiplication: for byte operands, it multiplies AL by the source and stores the 16-bit result in AX; for word, AX by source into DX:AX; and for doubleword, EAX by source into EDX:EAX, setting CF and OF if the high half is nonzero. The signed counterpart IMUL supports one, two, or three operands—for two-operand form, it multiplies source by destination (e.g., register or ) and stores in destination, or for one-operand, accumulator by source into accumulator pair—setting CF and OF if the result does not fit in the destination (i.e., high bits are not sign-extended). In 64-bit mode, REX.W extends to RAX and RDX:RAX. Division instructions divide the accumulator by the source, producing and without affecting most flags. DIV is unsigned: for byte, AX divided by source yields in AL and in AH; for word, DX:AX by source into AX () and DX (); doubleword uses EDX:EAX similarly, raising a divide-error exception (#DE) on or overflow. Signed division via IDIV follows the same register conventions but uses two's-complement arithmetic, also triggering #DE on invalid results like or out-of-range . These are slower than due to iterative algorithms in early implementations, though modern processors optimize them. Shift instructions manipulate bit positions for scaling, alignment, or extraction. SHL (or synonym SAL) shifts the destination left by a count in CL or immediate (1-31 bits), filling with zeros and setting CF to the last shifted-out bit; for single-bit shifts, OF indicates sign-bit change. SHR shifts right logically, filling the high bit with zero and setting CF to the shifted-out bit, with OF cleared for multi-bit or set based on sign change for one bit. Arithmetic right shift SAR preserves the when filling, ideal for signed division by powers of two, clearing OF and setting CF similarly. Rotate variants ROL and ROR shift bits circularly without loss, moving the overflow bit into CF; for example, ROL EAX, 1 rotates left, with CF receiving the original MSB. All affect SF, ZF, and PF, but undefined AF, and counts operand size to avoid excess shifts. Logical instructions perform bitwise operations, typically clearing CF and OF while setting other flags per result. AND computes the bitwise AND of source and destination, storing in destination and setting ZF if zero; it masks bits, useful for clearing flags or testing. OR performs bitwise OR, setting bits where either operand has a 1, and XOR exclusive-OR toggles differing bits—XOR EAX, EAX clears EAX to zero. NOT inverts all bits in the destination without flag changes, serving as a unary complement. TEST ANDs source and destination but discards the result, solely updating flags for conditional checks, such as TEST EAX, 1 to probe the least significant bit. These operate on any operand size and support memory access. Overflow handling relies on the OF flag, set by signed arithmetic instructions like ADD, SUB, IMUL when the result's sign differs from expected (e.g., positive + positive yielding negative). The JO instruction jumps if OF is 1, branching to an overflow handler, while JNO jumps if OF is 0 to continue normal execution; both use relative offsets (short or near) without modifying flags. For example, following ADD EAX, EBX, JO overflow_label detects signed overflow, ensuring program robustness in computations.

assembly

; Example: Multi-precision addition with overflow check ADD EAX, EBX ; Add low words, set flags ADC EDX, ECX ; Add high words + carry JO overflow_handler ; Jump if signed overflow

; Example: Multi-precision addition with overflow check ADD EAX, EBX ; Add low words, set flags ADC EDX, ECX ; Add high words + carry JO overflow_handler ; Jump if signed overflow

Control Flow Instructions

Control flow instructions in x86 assembly language enable dynamic alteration of program execution by transferring control to different addresses, either unconditionally or based on processor flags set by prior arithmetic or logic operations. These instructions are essential for implementing conditional logic, procedure calls, loops, and interrupt handling in both IA-32 and Intel 64 architectures. They operate by modifying the instruction pointer (IP, EIP, or RIP) and, in some cases, the code segment register (CS), supporting both near transfers (within the same code segment) and far transfers (across segments in non-flat memory models like real or protected mode).

Unconditional Transfers

Unconditional jumps, calls, and returns provide direct control flow changes without testing conditions. The JMP instruction transfers execution to a specified target address, either near (updating only IP/EIP/RIP) or far (also loading a new CS value in segmented modes). Near JMP supports immediate, register, or memory operands, while far JMP uses a pointer operand for segment:offset addressing. Neither variant affects flags. For example:

JMP rel32 ; Relative jump by 32-bit signed displacement JMP FAR ptr16:32 ; Far jump to segment:offset

JMP rel32 ; Relative jump by 32-bit signed displacement JMP FAR ptr16:32 ; Far jump to segment:offset

The CALL instruction invokes a subroutine by pushing the return (current EIP/RIP for near calls, or CS:EIP/RIP for far calls) onto the stack and jumping to the target, enabling modular code structure; far CALLs are legacy features in 64-bit mode. RET reverses this by popping the return from the stack to resume execution, with an optional immediate to adjust the stack pointer for cleanup. Like JMP, CALL and RET do not modify flags and support both near and far variants. Example:

CALL near_proc ; Near call, pushes EIP/RIP RET 8 ; Near return, pops EIP/RIP and adds 8 to RSP

CALL near_proc ; Near call, pushes EIP/RIP RET 8 ; Near return, pops EIP/RIP and adds 8 to RSP

These instructions are available in all operating modes, including real, protected, and 64-bit modes.

Conditional Branches

Conditional jump instructions (Jcc) branch to a target only if a specific flag condition is met, facilitating constructs and . They use relative displacements (8-, 16-, or 32-bit signed) and do not alter flags themselves. Common variants include JZ (jump if ZF=1, after operations like CMP yielding equality) and JNZ (ZF=0, for inequality); JC ( CF=1, e.g., after unsigned overflow) and JNC (CF=0); as well as signed comparisons like JG (greater: ZF=0 and SF=OF for no overflow in signed arithmetic) and JL (less: SF≠OF). For instance:

CMP EAX, EBX ; Sets flags based on EAX - EBX JG positive ; Jump if EAX > EBX (signed) JNZ not_equal ; Jump if EAX != EBX

CMP EAX, EBX ; Sets flags based on EAX - EBX JG positive ; Jump if EAX > EBX (signed) JNZ not_equal ; Jump if EAX != EBX

These branches support short (rel8), near (rel16/rel32), and in 64-bit mode, RIP-relative addressing, operating in all modes but with far jumps limited to compatibility submodes. They test flags generated by arithmetic/logic instructions, such as ADD, SUB, or CMP.

Loops

Loop instructions simplify repetitive execution by combining counter decrement with conditional jumps. The LOOP instruction decrements the ECX (32-bit) or RCX (64-bit) register and jumps to a label if the counter is non-zero, providing a basic counted loop without flag involvement. It uses a relative 8-bit displacement and is supported in IA-32 and Intel 64 modes. Example:

MOV ECX, 10 ; Set loop count loop_start: ; Loop body LOOP loop_start ; Decrement ECX, jump if !=0

MOV ECX, 10 ; Set loop count loop_start: ; Loop body LOOP loop_start ; Decrement ECX, jump if !=0

REP (repeat) prefixes enhance string operations (like MOVS or CMPS) for iteration, repeating the instruction ECX/RCX times until the counter reaches zero. Variants include REPE/REPZ (repeat while equal: ZF=1, stops on mismatch or zero count) and REPNE/REPNZ (repeat while not equal: ZF=0, stops on match or zero count), useful for memory scans or copies. These do not affect flags directly but inherit effects from the repeated instruction. For example:

REP MOVSB ; Copy ECX bytes from [ESI] to [EDI] REPE CMPSB ; Compare bytes until mismatch or ECX=0

REP MOVSB ; Copy ECX bytes from [ESI] to [EDI] REPE CMPSB ; Compare bytes until mismatch or ECX=0

LOOP and REP family instructions are available across all x86 modes, with 64-bit extensions using RCX and RFLAGS.

Interrupts

Interrupt instructions handle software-generated exceptions and returns from handlers. INT n causes a software interrupt by pushing the current flags, CS, and EIP/RIP onto the stack, clearing the (IF), and jumping to the vector at interrupt number n (0-255), which indexes the . It supports immediate 8-bit n and operates in all modes, though vector handling differs (e.g., IDT in ). Example:

INT 21h ; DOS interrupt (legacy)

INT 21h ; DOS interrupt (legacy)

IRET (interrupt return) restores execution by popping EIP/RIP, CS, and flags from the stack, reinstating the prior state including IF; a 64-bit variant IRETQ uses RIP and RFLAGS. Unlike RET, IRET handles privilege-level changes in protected mode. These instructions are fundamental for system calls and exception handling in x86 architectures. Far control transfers, such as far JMP, CALL, RET, and IRET, involve segment register updates (CS loading) in non-flat modes like or segmented , enabling inter-segment jumps without flat memory assumptions. In 64-bit , far variants are restricted to for legacy support.

Stack Instructions

The stack in x86 serves as a last-in, first-out (LIFO) primarily used for temporary storage during procedure calls, local variable allocation, and parameter passing. Stack instructions manage this structure by manipulating the stack pointer (SP or ESP/RSP depending on mode) and facilitating stack frame creation for function prologs and epilogs. These operations ensure efficient without direct address calculations, leveraging the hardware-supported stack segment (). The PUSH instruction decrements the stack pointer by the size of the operand (2, 4, or 8 bytes in 16-, 32-, or 64-bit modes, respectively) and stores the source operand at the new top of the stack. For example, in 32-bit mode, PUSH EAX first subtracts 4 from ESP, then writes the value of EAX to memory at [ESP]. This instruction supports immediate values, registers, or memory operands but does not affect the flags register. Variants like PUSHF (or PUSHFD/PUSHFQ) push the flags register onto the stack for preservation during interrupts or context switches. Additionally, PUSHAD (32-bit) and PUSHFQ (64-bit) push all general-purpose registers or flags, respectively, enabling atomic register saves. Conversely, the POP instruction loads the value from the top of the stack into the destination and then increments the stack pointer by the size. For instance, POP EAX reads the 4-byte value at [ESP] into EAX and adds 4 to ESP in 32-bit mode. Like PUSH, it supports registers or but cannot pop into the CS segment register; instead, RET is used for control transfers involving CS. The POPF (or POPFD/POPFQ) variant restores the , while POPAD (32-bit) and POPFQ (64-bit) restore all general-purpose registers or flags, providing symmetric bulk operations to PUSH counterparts. These instructions also do not modify flags except when popping them explicitly. For procedure management, the ENTER instruction establishes a stack frame by pushing the frame pointer (EBP/RBP), allocating space for local variables based on a specified size, and handling nesting levels for languages like Pascal with recursive calls. It takes two operands: the allocation size (in bytes) and a nesting level (0-31), adjusting EBP to point to the frame base and reserving space on the stack. The companion LEAVE instruction reverses this by restoring the stack pointer from the frame pointer (MOV ESP, EBP) and popping EBP, effectively deallocating the frame just before a RET. This pair simplifies / code compared to manual PUSH/MOV/SUB and POP/MOV sequences, though modern compilers often use the latter for optimization. For example, ENTER 8, 0 in 32-bit mode pushes EBP, sets EBP to ESP, and subtracts 8 from ESP for two local dwords. In 64-bit mode under the System V ABI (common on /Unix), the stack must maintain 16-byte alignment upon function entry to optimize SIMD operations and reduce alignment faults; this requires if necessary during pushes or allocations. The ABI specifies that the stack pointer (RSP) 16 equals 0 at the start of each function, with the ensuring alignment after the return address push. Misalignment can degrade or cause exceptions in aligned instructions like MOVAPS. Stack overflow occurs when PUSH or ENTER exceeds the stack segment limit or page boundaries, triggering a #SS (stack segment) exception in protected or ; underflow from excessive POP or LEAVE attempts accesses invalid , potentially causing a #GP (. These hardware-detected conditions rely on segment descriptors and page tables rather than EFLAGS bits like overflow (OF) or carry (CF), which apply to arithmetic operations. Detection integrates with the OS for handling, such as expanding the stack or terminating the process.

Floating-Point Instructions

The x87 (FPU) provides scalar floating-point operations in x86 assembly language, integrated into the processor since the 8087 and later embedded in the CPU core. It employs a stack-based with eight 80-bit registers, denoted ST(0) through ST(7), where ST(0) serves as the top of the stack (TOS). Each register holds in extended-precision format: a 1-bit , a 15-bit biased exponent, and a 64-bit (with an explicit leading 1 for normalized numbers). The stack pointer TOP, stored in bits 11-13 of the FPU status word, dynamically indicates the current TOS, allowing implicit operand addressing relative to ST(0). The tag word tracks the content type of each register (valid, zero, special, or empty) to optimize operations and . Basic arithmetic instructions in the x87 FPU perform operations primarily on the TOS and the next stack element, ST(1), with results replacing the TOS unless specified otherwise. The instruction adds the source (ST(i) or ) to ST(0), storing the result in ST(0); for example, FADD ST(1), ST(0) computes ST(0) + ST(1) and places it in ST(0). Similarly, FSUB subtracts the source from ST(0), FMUL multiplies them, and FDIV divides ST(0) by the source, each with variants like FADDP that pop the stack post-operation to free ST(1). These instructions support real operands in single (32-bit), double (64-bit), or extended (80-bit) precision, using the FPU's internal 80-bit format for computations to minimize rounding errors. Opcodes vary by operand type, such as D8 /0 for FADD with a 32-bit memory operand or DC C0+i for register-to-register. For storing results, the FST instruction copies the TOS to a destination without altering the stack, such as FST m64fp to write ST(0) as a 64-bit double-precision value to ; the popping variant FSTP additionally decrements the stack pointer. These operations ensure compatibility with formats when interfacing with , though internal computations retain for accuracy. Transcendental instructions compute specialized functions on the TOS. FSIN calculates the sine of ST(0) in radians (range -2^63 to +2^63), replacing ST(0) with the result and setting the C2 flag for out-of-range inputs; FCOS does likewise for cosine. FATAN computes the arctangent of ST(1)/ST(0), stores it in ST(1), and pops the stack, useful for computations with accuracy better than 1 ulp on processors and later. Comparison instructions like FCOM evaluate the TOS against a source , setting condition codes C0, C2, and C3 in the status word to indicate relations: C3=0 and C2=0 for ST(0) > source, C3=1 and C2=0 for ST(0) < source, C3=0 and C2=1 for equality, or unordered () otherwise. For instance, FCOM ST(1) compares ST(0) and ST(1), raising an invalid-operation exception if either is . This enables conditional branching via subsequent instructions like FSTSW to transfer flags to the EFLAGS register. Control instructions manage FPU state: FINIT initializes the FPU by setting the control word to 037FH (masking all exceptions, to nearest), clearing the status word, and tagging all registers as empty; FCLEX (or FNCLEX without wait) clears pending exception flags in the status word after checking for unmasked exceptions.
InstructionPrimary OperationKey Flags/EffectsExample Usage
FADDAdditionUpdates C1 for inexact results[FADD](/page/FADD) ST(2), ST(0) (ST(0) += ST(2))
FSUBSubtractionAs aboveFSUBR ST(0), m32fp (ST(0) = memory - ST(0), reverse subtract)
FMULMultiplicationAs aboveFMULP ST(1), ST(0) (pops after multiply)
FDIVDivisionAs aboveFDIV ST(3), ST(0) (ST(3) /= ST(0))
FSTStore TOSNo stack popFSTSW AX (store status word)
FSINSineC2=1 if out-of-rangeFSIN (ST(0) = sin(ST(0)))
FCOSCosineC2=1 if |ST(0)| ≥ 2^63FCOS (ST(0) = cos(ST(0)))
FATANArctangentPops stackFATAN (ST(1) = atan(ST(1)/ST(0)))
FCOMCompareSets C0/C2/C3FCOM m80fp (compare to extended memory)
FINITInitializeResets to defaultFINIT (clear exceptions, empty stack)
FCLEXClear exceptionsClears flagsFCLEX (reset after error)
Although the x87 FPU remains fully supported in modern x86 processors for , it has been largely supplanted by SSE instructions for higher performance in scalar and vectorized floating-point tasks, yet it persists in applications demanding the extra precision of its 80-bit format to avoid intermediate rounding losses in chained computations.

SIMD Instructions

(Single Instruction, Multiple Data) instructions in x86 assembly language enable parallel processing of multiple data elements within a single operation, significantly enhancing performance for vectorized computations. These extensions build upon the scalar floating-point capabilities by introducing wider registers and specialized operations for packed data types, such as integers and floating-point values. Introduced progressively since the late 1990s, SIMD instructions form a cornerstone of on x86 processors. The earliest SIMD extension, MMX (MultiMedia eXtension), introduced in 1997, provides operations on 64-bit MMX registers (MM0 through MM7, aliasing the x87 FPU registers) for packed integers. It supports data types like 8 packed bytes, 4 packed words, 2 packed doublewords, or a single quadword, with instructions such as PADDB (add packed bytes with saturation), PMULHW (multiply packed words, high part), and MOVQ (move quadword). MMX enables parallel integer arithmetic, logical operations, and shuffles for tasks like image processing, but requires EMMS to clear FPU tags after use to avoid conflicts with floating-point code. It laid the groundwork for later SIMD sets but is limited to 64-bit width. The foundational SIMD extension for floating-point, (SSE), utilizes 128-bit XMM registers (XMM0 through XMM15 in 64-bit mode) to handle packed data. SSE supports operations on single-precision floating-point (32-bit) and vectors, with key instructions including MOVAPS for aligned moves of packed single-precision floating-point values and ADDPS for adding such vectors element-wise. For example, the instruction ADDPS xmm1, xmm2 adds the packed single-precision values in xmm2 to those in xmm1, storing the result in xmm1. SSE instructions use legacy SSE opcodes and are essential for basic vector processing. Advanced Vector Extensions (AVX) extend SIMD capabilities to 256-bit YMM registers (YMM0 through YMM15), doubling the vector width for greater throughput. AVX employs the VEX encoding prefix (2- or 3-byte) to specify vector length and operands, avoiding legacy SSE escape bytes. Instructions like VADDPD add packed double-precision floating-point (64-bit) values, as in VADDPD ymm1, ymm2, ymm3, which processes eight elements simultaneously. AVX also supports masking via the VEX.vvvv field for conditional operations. Building on SSE, AVX includes instructions such as PACKSSDW, which packs signed doublewords into signed words with saturation (e.g., VPACKSSDW ymm1, ymm2, ymm3), useful for data compression in . Additionally, PSHUFB shuffles bytes based on a control mask (e.g., VPSHUFB ymm1, ymm2, ymm3), enabling flexible data for tasks like byte-level reordering. AVX-512 further advances to 512-bit ZMM registers (ZMM0 through ZMM31), supporting up to 16 single-precision or 8 double-precision elements per operation. It introduces the EVEX encoding (4-byte prefix) for features like writemasking (using k registers for element-wise control, e.g., {k1}{z} to zero non-masked elements) and broadcasting from memory. The instruction VGATHERDPD gathers double-precision values using 32-bit indices (e.g., VGATHERDPD zmm1 {k1}, vm512), facilitating sparse access in irregular datasets. Per-lane operations allow independent processing of vector lanes, enhancing flexibility. AVX-512 instructions extend prior sets, such as VADDPD now supporting ZMM widths with masking. These SIMD instructions find primary use in multimedia applications, where parallel operations accelerate video encoding, image filtering, and audio processing—for instance, ADDPS for pixel value adjustments or PSHUFB for color channel swaps. In machine learning, they optimize vectorized computations like matrix additions (VADDPD) and gather operations (VGATHERDPD) for neural network training on large datasets, providing substantial speedups in tensor operations.
ExtensionRegister WidthKey RegistersEncodingExample Vector Capacity (Single-Precision Float)
SSE128-bitXMM0-XMM15Legacy SSE4 elements
AVX256-bitYMM0-YMM15VEX8 elements
512-bitZMM0-ZMM31EVEX16 elements

Program Flow and Examples

Program Flow Control

In x86 assembly language, program flow control encompasses mechanisms for structuring code execution beyond basic linear sequencing, including subroutine management, asynchronous event handling, and conditional logic. These features enable , response to hardware events, and error recovery, forming the backbone of complex applications from operating systems to . Procedures allow for reusable code blocks, while and exceptions provide hooks for system-level interactions, all orchestrated through the processor's interrupt architecture and stack-based control transfers. Procedures in x86 assembly are defined using assembler-specific directives and invoked via the CALL and RET instructions, which manage the stack to preserve execution context. In Microsoft Macro Assembler (MASM), procedures are delimited by PROC and ENDP directives, which declare the entry point and scope, respectively, facilitating linkage and scoping for the subroutine. For instance, a simple procedure might be structured as follows:

MyProc PROC ; procedure body ret MyProc ENDP

MyProc PROC ; procedure body ret MyProc ENDP

This setup supports parameter passing and return value handling according to established application binary interfaces (ABIs). The cdecl convention, common in systems and C, passes parameters on the stack from right to left, with the caller responsible for stack cleanup after the RET instruction, promoting flexibility for variable-argument functions. In contrast, the stdcall convention, prevalent in calls, reverses the cleanup duty to the callee, standardizing stack frame sizes for better performance in frequent calls. These ABIs ensure between assembly and higher-level languages, with parameters often accessed via offsets from the EBP register in 32-bit modes or through registers in 64-bit System V ABI. Interrupt service routines (ISRs) handle asynchronous events from hardware or software, configured through the (IDT), a system that maps vectors to handler addresses. The IDT is loaded into the processor using the LIDT instruction, with each entry specifying a gate descriptor that points to the ISR , segment selector, and privilege level. ISRs are invoked automatically on occurrence, saving the processor state on the stack before transferring control. To manage interrupt enabling and disabling, the CLI (Clear Interrupt Flag) and STI (Set Interrupt Flag) instructions toggle the IF bit in the EFLAGS register, allowing software to mask interrupts during critical sections. For example, an ISR might conclude with IRET to restore the state and return. Exceptions represent synchronous events triggered by instruction execution errors or violations, routed through the similar to interrupts but classified as faults, traps, or aborts based on restartability. The #GP (General ) exception, vector 13, occurs on violations such as invalid execution, privilege level mismatches, or stack segment faults, pushing an onto the stack for handler analysis. Exception handlers, defined in the as trap gates for precise restarts, process the event—such as logging the faulting address from CR2 for page faults—and typically invoke IRET to resume execution, ensuring system stability. Hardware traps like #GP thus enable robust error handling in protected-mode environments. High-level constructs like loops are implemented using conditional jumps that alter flow based on states set by instructions. A typical loop decrements a counter and jumps back if non-zero, as in:

mov ecx, 10 ; loop counter loop_start: ; loop body dec ecx jnz loop_start ; jump if not zero

mov ecx, 10 ; loop counter loop_start: ; loop body dec ecx jnz loop_start ; jump if not zero

This leverages instructions like JNZ (jump if not zero) to test the ZF , providing efficient without dedicated loop opcodes. Conditional assembly directives further enhance flow control at assemble time; in (NASM), %if evaluates expressions to include or exclude code blocks, such as %if testing symbol definitions for platform-specific variants. Debugging integrates seamlessly via software breakpoints, where the INT 3 instruction (opcode CC) generates a #BP (Breakpoint) exception, vector 3, pausing execution for intervention. This one-byte trap is ideal for non-intrusive breakpoints, with handlers in the routing to the debugger's routine, which can inspect registers and memory before single-stepping with TF in EFLAGS.

Basic Hello World Programs

A basic "Hello World" program in x86 assembly demonstrates fundamental operations and program termination specific to the target . These examples illustrate how assembly code interacts with the system for simple text output, highlighting differences in calling conventions, system calls, and linking requirements across platforms. The programs are kept minimal to focus on core concepts like data declaration, register usage, and invocation of OS services. For 16-bit using MASM syntax, the program employs DOS interrupt 21h with function 09h in AH to print a (ending with '$'), followed by function 4Ch for program termination. The .model small directive specifies a small model suitable for DOS executables.

; hello.asm - 16-bit MS-DOS Hello World in MASM .model small .stack 128 .data Msg db 'Hello, World!', 13, 10, '$' ; Message with CR/LF and terminator .code start: mov ax, @data mov ds, ax mov ah, 09h lea dx, Msg int 21h mov ah, 4Ch int 21h end start

; hello.asm - 16-bit MS-DOS Hello World in MASM .model small .stack 128 .data Msg db 'Hello, World!', 13, 10, '$' ; Message with CR/LF and terminator .code start: mov ax, @data mov ds, ax mov ah, 09h lea dx, Msg int 21h mov ah, 4Ch int 21h end start

To assemble and link: Use ml hello.asm to produce the executable hello.exe. This runs in real mode on or compatible emulators. In 32-bit Windows using MASM syntax, a graphical "Hello World" can invoke MessageBoxA from user32.dll to display the message in a dialog box, with ExitProcess from kernel32.dll for termination. The .model flat directive enables flat memory addressing, and the program follows the stdcall calling convention.

; hello.asm - 32-bit Windows Hello World in MASM with MessageBoxA .386 .model flat, stdcall option casemap:none include windows.inc include kernel32.inc include user32.inc includelib kernel32.lib includelib user32.lib .data titleMsg db 'x86 Assembly', 0 msg db 'Hello, World!', 0 .code Main: push 0 ; MB_OK push offset titleMsg ; Caption push offset msg ; Text push 0 ; HWND_DESKTOP call MessageBoxA push 0 call ExitProcess end Main

; hello.asm - 32-bit Windows Hello World in MASM with MessageBoxA .386 .model flat, stdcall option casemap:none include windows.inc include kernel32.inc include user32.inc includelib kernel32.lib includelib user32.lib .data titleMsg db 'x86 Assembly', 0 msg db 'Hello, World!', 0 .code Main: push 0 ; MB_OK push offset titleMsg ; Caption push offset msg ; Text push 0 ; HWND_DESKTOP call MessageBoxA push 0 call ExitProcess end Main

Assemble with ml /c /coff hello.asm and link with link /subsystem:windows hello.obj user32.lib kernel32.lib /entry:Main /libpath:"C:\path\to\libs" to generate hello.exe. For 32-bit using NASM syntax, the program uses 4 (sys_write) via INT 80h to output to stdout (file descriptor 1), with arguments in EBX (descriptor), ECX (buffer), and EDX (), followed by 1 (sys_exit) with EBX as the exit code. No external libraries are required beyond the kernel.

; hello.asm - 32-bit Linux Hello World in NASM SECTION .data msg db 'Hello, World!', 10 msgLen equ $ - msg SECTION .text global _start _start: mov eax, 4 ; sys_write mov ebx, 1 ; stdout mov ecx, msg ; buffer mov edx, msgLen ; [length](/page/Length) int 80h mov eax, 1 ; sys_exit mov ebx, 0 ; exit code int 80h

; hello.asm - 32-bit Linux Hello World in NASM SECTION .data msg db 'Hello, World!', 10 msgLen equ $ - msg SECTION .text global _start _start: mov eax, 4 ; sys_write mov ebx, 1 ; stdout mov ecx, msg ; buffer mov edx, msgLen ; [length](/page/Length) int 80h mov eax, 1 ; sys_exit mov ebx, 0 ; exit code int 80h

Assemble with nasm -f elf32 hello.asm -o hello.o and link with ld -m elf_i386 hello.o -o hello to produce the executable. In 64-bit Linux using NASM syntax, a higher-level approach links against libc to call printf for formatted output, leveraging the x86-64 System V ABI where the first argument is in RDI and RIP-relative addressing accesses data. The program uses position-independent code for the string reference.

; hello.asm - 64-bit Linux Hello World in NASM with printf extern printf extern exit SECTION .data msg db 'Hello, World!', 10, 0 SECTION .text global main main: mov rdi, msg ; Argument in RDI (RIP-relative) xor rax, rax ; No vector args call [printf](/page/Printf) mov rdi, 0 call exit

; hello.asm - 64-bit Linux Hello World in NASM with printf extern printf extern exit SECTION .data msg db 'Hello, World!', 10, 0 SECTION .text global main main: mov rdi, msg ; Argument in RDI (RIP-relative) xor rax, rax ; No vector args call [printf](/page/Printf) mov rdi, 0 call exit

Assemble with nasm -f elf64 hello.asm -o hello.o and link with ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 hello.o -lc -o hello or simply gcc hello.o -o hello to include libc. This produces a dynamically linked .

Advanced Usage Examples

Advanced usage of x86 assembly language often involves low-level manipulation of processor state and hardware interactions, enabling optimized or specialized code such as position-independent executables, dynamic code generation, and custom interrupt processing. These techniques leverage specific instructions to interact with flags, the instruction pointer, and system events, but require careful handling to ensure correctness across processor generations. Flag manipulation is crucial for conditional control in performance-critical loops, where instructions like ADD can set flags such as the Carry Flag (CF) and Zero Flag (ZF) based on arithmetic results. The ADD instruction adds the source operand to the destination and stores the result in the destination, setting CF if there is a carry out of the most significant bit for unsigned operations and ZF if the result is zero. Following this, the JC (Jump if Carry) instruction can branch to a label if CF is set, enabling efficient handling of overflow in unsigned arithmetic loops. For instance, in a loop accumulating values until overflow:

mov eax, 0xFFFFFFFF ; Initialize accumulator to max unsigned 32-bit value mov ecx, 10 ; Loop counter loop_start: add eax, 1 ; Increment; sets CF if overflow jc overflow_handler ; Jump if carry (overflow) dec ecx jnz loop_start ; Continue if no overflow overflow_handler: ; Handle wrap-around

mov eax, 0xFFFFFFFF ; Initialize accumulator to max unsigned 32-bit value mov ecx, 10 ; Loop counter loop_start: add eax, 1 ; Increment; sets CF if overflow jc overflow_handler ; Jump if carry (overflow) dec ecx jnz loop_start ; Continue if no overflow overflow_handler: ; Handle wrap-around

This pattern detects unsigned overflow without additional comparisons, optimizing tight loops in numerical computations. Accessing the instruction pointer (IP, or RIP in 64-bit mode) supports (PIC), essential for shared libraries and . The LEA (Load Effective Address) instruction computes the effective address of its source operand without memory access, storing it in the destination register; in PIC contexts, RIP-relative addressing allows relative offsets from the current instruction position. Using assembler syntax like lea ebx, [rel &#36;] loads the address of the current instruction into EBX, providing the code's base position for runtime relocations in PIC binaries. An example in 64-bit PIC code to compute a relative offset to a section:

lea rbx, [rel &#36;] ; Load current RIP-relative position into RBX add rbx, data_offset ; Adjust to target data location (offset computed at link time) mov rax, [rbx] ; Access data at runtime-independent address

lea rbx, [rel &#36;] ; Load current RIP-relative position into RBX add rbx, data_offset ; Adjust to target data location (offset computed at link time) mov rax, [rbx] ; Access data at runtime-independent address

This avoids absolute addresses, ensuring the code relocates correctly when loaded at arbitrary base addresses. Self-modifying code alters instructions at runtime, useful for just-in-time compilation or adaptive optimization, but requires serialization to flush processor caches and ensure the modified instructions are fetched correctly. After writing to a code region, executing a serializing instruction like prevents speculative execution of stale instructions by invalidating affected cache lines. The instruction returns processor identification but also acts as a full barrier, flushing the instruction pipeline. A simple self-modifying example jumps to a modifiable region, patches an opcode (e.g., changing NOP to ADD), and resumes:

jmp modify_code ; Jump to modifier original_code: nop ; Placeholder instruction at address 0x1000 (example) modify_code: mov byte [0x1000], 0x50 ; Patch NOP (0x90) to PUSH AX (0x50) - simplistic example cpuid ; Serialize: flush caches and pipeline jmp 0x1000 ; Resume at modified code

jmp modify_code ; Jump to modifier original_code: nop ; Placeholder instruction at address 0x1000 (example) modify_code: mov byte [0x1000], 0x50 ; Patch NOP (0x90) to PUSH AX (0x50) - simplistic example cpuid ; Serialize: flush caches and pipeline jmp 0x1000 ; Resume at modified code

Such techniques incur performance penalties due to cache invalidation but enable runtime code adaptation in embedded or virtualized environments. Custom interrupt handlers allow direct hardware interaction, such as processing keyboard input via IRQ 1 (INT 0x21 in legacy modes). In , the (IDT) routes hardware s to user-defined handlers, where the processor saves the current RIP and RFLAGS before transferring control. A basic keyboard handler reads from 0x60 after acknowledging the , processing scancodes for key presses. Example handler stub in 32-bit :

keyboard_handler: pushad ; Save registers in al, 0x60 ; Read scancode from keyboard controller ; Process scancode (e.g., map to ASCII) mov [key_buffer], al ; Store in buffer mov al, 0x20 ; EOI to PIC out 0x20, al ; Acknowledge interrupt popad iret ; Return, restoring RIP and EFLAGS

keyboard_handler: pushad ; Save registers in al, 0x60 ; Read scancode from keyboard controller ; Process scancode (e.g., map to ASCII) mov [key_buffer], al ; Store in buffer mov al, 0x20 ; EOI to PIC out 0x20, al ; Acknowledge interrupt popad iret ; Return, restoring RIP and EFLAGS

This setup, registered in the IDT at vector 33 (IRQ 1 + 32), enables real-time input capture in kernel or code. In 64-bit mode, advanced usage extends to system calls via the SYSCALL instruction, which saves the current to RCX and RFLAGS (the 64-bit extension of EFLAGS) to R11 before switching to kernel mode. RFLAGS carries condition codes and status bits, while tracks execution position; in syscalls, parameters are passed in registers, with SYSCALL enabling fast transitions without stack manipulation. An example write syscall:

mov rax, 1 ; Syscall number: write mov rdi, 1 ; [File descriptor](/page/File_descriptor): stdout mov rsi, msg ; Buffer address mov rdx, len ; [Length](/page/Length) syscall ; Invoke; RCX = saved [RIP](/page/The_Rip), R11 = saved RFLAGS

mov rax, 1 ; Syscall number: write mov rdi, 1 ; [File descriptor](/page/File_descriptor): stdout mov rsi, msg ; Buffer address mov rdx, len ; [Length](/page/Length) syscall ; Invoke; RCX = saved [RIP](/page/The_Rip), R11 = saved RFLAGS

This preserves user-state for efficient return via SYSRET, minimizing overhead in high-frequency kernel interactions.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.