Hubbry Logo
logo
X86 memory segmentation
Community hub

X86 memory segmentation

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

X86 memory segmentation AI simulator

(@X86 memory segmentation_simulator)

X86 memory segmentation

x86 memory segmentation is a term for the kind of memory segmentation characteristic of the Intel x86 computer instruction set architecture. The x86 architecture has supported memory segmentation since the original Intel 8086 (1978),[disputeddiscuss] but x86 memory segmentation is a plainly descriptive retronym. The introduction of memory segmentation mechanisms in this architecture reflects the legacy of earlier 80xx processors, which initially could only address 16, or later 64 KB of memory (16,384 or 65,536 bytes), and whose instructions and registers were optimised for the latter. Dealing with larger addresses and more memory was thus comparably slower, as that capability was somewhat grafted-on in the Intel 8086. Memory segmentation could keep programs compatible, relocatable in memory, and by confining significant parts of a program's operation to 64 KB segments, the program could still run faster.

In 1982, the Intel 80286 added support for virtual memory and memory protection; the original mode was renamed real mode, and the new version was named protected mode. The x86-64 architecture, introduced in 2003, has largely dropped support for segmentation in 64-bit mode.

In both real and protected modes, the system uses 16-bit segment registers to derive the actual memory address. In real mode, the registers CS, DS, SS, and ES point to the currently used program code segment (CS), the current data segment (DS), the current stack segment (SS), and one extra segment determined by the system programmer (ES). The Intel 80386, introduced in 1985, adds two additional segment registers, FS and GS, with no specific uses defined by the hardware. The way in which the segment registers are used differs between the two modes.

The choice of segment is normally defaulted by the processor according to the function being executed. Instructions are always fetched from the code segment. Any data reference to the stack, including any stack push or pop, uses the stack segment; data references indirected through the BP register typically refer to the stack and so they default to the stack segment. The extra segment is the mandatory destination for string operations (for example MOVS or CMPS); for this one purpose only, the automatically selected segment register cannot be overridden. All other references to data use the data segment by default. The data segment is the default source for string operations, but it can be overridden. The instruction format allows an optional segment prefix byte which can be used to override the default segment for selected instructions if desired.

In real mode or V86 mode, the fundamental size of a segment is 65,536 bytes, with individual bytes being addressed using 16-bit offsets.

The 16-bit segment selector in the segment register is interpreted as the most significant 16 bits of a linear 20-bit address, called a segment address, of which the remaining four least significant bits are all zeros. The segment address is always added to a 16-bit offset in the instruction to yield a linear address, which is the same as physical address in this mode. For instance, the segmented address 06EFh:9234h (here the suffix "h" means hexadecimal) has a segment selector of 06EFh, representing a segment address of 06EF0h, to which the offset is added, yielding the linear address 06EF0h + 9234h = 10124h.

(The leading zeros of the linear address, segmented addresses, and the segment and offset fields are shown here for clarity. They are usually omitted.)

Because of the way the segment address and offset are added, a single linear address can be mapped to up to 212 = 4096 distinct segment:offset pairs. For example, the linear address 08124h can have the segmented addresses 06EFh:1234h, 0812h:0004h, 0000h:8124h, etc. This could be confusing to programmers accustomed to unique addressing schemes, but it can also be used to advantage, for example when addressing multiple nested data structures.

See all
User Avatar
No comments yet.