Hubbry Logo
Page (computer memory)Page (computer memory)Main
Open search
Page (computer memory)
Community hub
Page (computer memory)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Page (computer memory)
Page (computer memory)
from Wikipedia

A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in a page table. It is the smallest unit of data for memory management in an operating system that uses virtual memory. Similarly, a page frame is the smallest fixed-length contiguous block of physical memory into which memory pages are mapped by the operating system.[1][2][3]

A transfer of pages between main memory and an auxiliary store, such as a hard disk drive, is referred to as paging or swapping.[4]

Explanation

[edit]

Computer memory is divided into pages so that information can be found more quickly.

The concept is named by analogy to the pages of a printed book. If a reader wanted to find, for example, the 5,000th word in the book, they could count from the first word. This would be time-consuming. It would be much faster if the reader had a listing of how many words are on each page. From this listing they could determine which page the 5,000th word appears on, and how many words to count on that page. This listing of the words per page of the book is analogous to a page table of a computer file system.[5]

Page size

[edit]

Page size trade-off

[edit]

Page size is usually determined by the processor architecture. Traditionally, pages in a system had uniform size, such as 4,096 bytes. However, processor designs often allow two or more, sometimes simultaneous, page sizes due to its benefits. There are several points that can factor into choosing the best page size.[6]

Page table size

[edit]

A system with a smaller page size uses more pages, requiring a page table that occupies more space. For example, if a 232 virtual address space is mapped to 4 KiB (212 bytes) pages, the number of virtual pages is 220 = (232 / 212). However, if the page size is increased to 32 KiB (215 bytes), only 217 pages are required. A multi-level paging algorithm can decrease the memory cost of allocating a large page table for each process by further dividing the page table up into smaller tables, effectively paging the page table.

TLB usage

[edit]

Since every access to memory must be mapped from virtual to physical address, reading the page table every time can be quite costly. Therefore, a very fast kind of cache, the translation lookaside buffer (TLB), is often used. The TLB is of limited size, and when it cannot satisfy a given request (a TLB miss) the mapping must be fetched from the page table (either by hardware, firmware, or software, depending on the architecture) for the correct mapping. Larger page sizes mean that a TLB cache of the same size can keep track of larger amounts of memory, which avoids costly TLB misses.

Internal fragmentation

[edit]

Rarely do processes require the use of an exact number of pages. As a result, the last page will likely only be partially full, wasting some amount of memory. Larger page sizes lead to a large amount of wasted memory, as more potentially unused portions of memory are loaded into the main memory. Smaller page sizes ensure a closer match to the actual amount of memory required in an allocation.

As an example, assume the page size is 1024 B. If a process allocates 1025 B, two pages must be used, resulting in 1023 B of unused space (where one page fully consumes 1024 B and the other only 1 B).

Disk access

[edit]

When transferring from a rotational disk, much of the delay is caused by seek time, the time it takes to correctly position the read/write heads above the disk platters. Because of this, large sequential transfers are more efficient than several smaller transfers. Transferring the same amount of data from disk to memory often requires less time with larger pages than with smaller pages.

Getting page size programmatically

[edit]

Most operating systems allow programs to discover the page size at runtime. This allows programs to use memory more efficiently by aligning allocations to this size and reducing overall internal fragmentation of pages.

Unix-like operating systems

[edit]

Unix-like systems may use the system function sysconf(),[7][8][9][10][11] as illustrated in the following example written in the C programming language.

#include <stdio.h>
#include <unistd.h> /* sysconf(3) */

int main(void)
{
	printf("The page size for this system is %ld bytes.\n",
		sysconf(_SC_PAGESIZE)); /* _SC_PAGE_SIZE is OK too. */

	return 0;
}

In many Unix-like systems, the command-line utility getconf can be used.[12][13][14] For example, getconf PAGESIZE will return the page size in bytes.

Windows-based operating systems

[edit]

Win32-based operating systems, such as those in the Windows 9x and Windows NT families, may use the system function GetSystemInfo()[15][16] from kernel32.dll.

#include <stdio.h>
#include <windows.h>

int main(void)
{
	SYSTEM_INFO si;
	GetSystemInfo(&si);

	printf("The page size for this system is %u bytes.\n", si.dwPageSize);

	return 0;
}

Multiple page sizes

[edit]

Some instruction set architectures can support multiple page sizes, including pages significantly larger than the standard page size. The available page sizes depend on the instruction set architecture, processor type, and operating (addressing) mode. The operating system selects one or more sizes from the sizes supported by the architecture. Note that not all processors implement all defined larger page sizes. This support for larger pages (known as "huge pages" in Linux, "superpages" in FreeBSD, and "large pages" in Microsoft Windows and IBM AIX terminology) allows for "the best of both worlds", reducing the pressure on the TLB cache (sometimes increasing speed by as much as 15%) for large allocations while still keeping memory usage at a reasonable level for small allocations.

Page sizes among architectures[17]
Architecture Smallest page size Larger page sizes
IA-32 (32-bit x86)[18] 4 KiB MiB in PSE mode, 2 MiB in PAE mode[19]
x86-64[18] 4 KiB 2 MiB, 1 GiB (only when the CPU has PDPE1GB flag)
IA-64 (Itanium)[20] 4 KiB 8 KiB, 64 KiB, 256 KiB, 1 MiB, 4 MiB, 16 MiB, 256 MiB[19]
Power ISA[21] 4 KiB 64 KiB, 16 MiB, 16 GiB
SPARC v8 with SPARC Reference MMU[22] 4 KiB 256 KiB, 16 MiB
UltraSPARC Architecture 2007[23] 8 KiB 64 KiB, 512 KiB (optional), 4 MiB, 32 MiB (optional), 256 MiB (optional), 2 GiB (optional), 16 GiB (optional)
ARMv7[24] 4 KiB 64 KiB, 1 MiB ("section"), 16 MiB ("supersection") (defined by a particular implementation)
AArch64[25] 4 KiB 16 KiB, 64 KiB, 2 MiB, 32 MiB, 512 MiB, 1 GiB
RISCV32[26] 4 KiB 4 MiB ("megapage")
RISCV64[26] 4 KiB 2 MiB ("megapage"), 1 GiB ("gigapage"), 512 GiB ("terapage", only for CPUs with 43-bit address space or more), 256 TiB ("petapage", only for CPUs with 57-bit address space or more),

Starting with the Pentium Pro, and the AMD Athlon, x86 processors support 4 MiB pages (called Page Size Extension) (2 MiB pages if using PAE) in addition to their standard 4 KiB pages; newer x86-64 processors, such as AMD's newer AMD64 processors and Intel's Westmere[27] and later Xeon processors can use 1 GiB pages in long mode. IA-64 supports as many as eight different page sizes, from 4 KiB up to 256 MiB, and some other architectures have similar features.[specify]

Larger pages, despite being available in the processors used in most contemporary personal computers, are not in common use except in large-scale applications, the applications typically found in large servers and in computational clusters, and in the operating system itself. Commonly, their use requires elevated privileges, cooperation from the application making the large allocation (usually setting a flag to ask the operating system for huge pages), or manual administrator configuration; operating systems commonly, sometimes by design, cannot page them out to disk.

However, SGI IRIX has general-purpose support for multiple page sizes. Each individual process can provide hints and the operating system will automatically use the largest page size possible for a given region of address space.[28] Later work proposed transparent operating system support for using a mix of page sizes for unmodified applications through preemptible reservations, opportunistic promotions, speculative demotions, and fragmentation control.[29]

Linux has supported huge pages on several architectures since the 2.6 series via the hugetlbfs filesystem[30] and without hugetlbfs since 2.6.38.[31] Windows Server 2003 (SP1 and newer), Windows Vista and Windows Server 2008 support huge pages under the name of large pages.[32] Windows 2000 and Windows XP support large pages internally, but do not expose them to applications.[33] Reserving large pages under Windows requires a corresponding right that the system administrator must grant to the user because large pages cannot be swapped out under Windows. Beginning with version 9, Solaris supports large pages on SPARC and x86.[34][35] FreeBSD 7.2-RELEASE features superpages.[36] Note that until recently in Linux, applications needed to be modified in order to use huge pages. The 2.6.38 kernel introduced support for transparent use of huge pages.[31] On Linux kernels supporting transparent huge pages, as well as FreeBSD and Solaris, applications take advantage of huge pages automatically, without the need for modification.[36]

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In computer science, a page (also known as a memory page or virtual page) is a fixed-length contiguous block of virtual memory that serves as the fundamental unit for memory management in paging systems, typically measuring 4 KB (4096 bytes) on most modern architectures to balance efficiency and overhead. Pages enable the operating system to map logical addresses from a process's virtual address space to physical addresses in main memory (RAM) or secondary storage, allowing programs larger than available physical memory to execute by loading only necessary portions on demand. This abstraction divides the virtual address space into equal-sized pages and physical memory into corresponding frames, with a page table maintaining the mappings between them to facilitate address translation via the memory management unit (MMU). Paging operates through demand paging, where pages are transferred from secondary storage (such as a hard disk or SSD) to physical memory only when accessed, triggered by a page fault if the page is not resident in RAM; the operating system then allocates a free frame, loads the page, and updates the page table entry. Page tables, stored in kernel memory, contain entries for each virtual page indicating its physical frame location, presence in memory, protection bits (e.g., read-only or executable), and validity; multi-level page tables (e.g., two- or four-level hierarchies) are commonly used in 64-bit systems to reduce their size and improve lookup efficiency via the translation lookaside buffer (TLB), a hardware cache for recent mappings. This mechanism supports features like copy-on-write for efficient process forking and shared memory between processes by mapping the same physical frame to multiple virtual pages. The concept of paging emerged in the late 1950s with the development of the Atlas computer at the University of Manchester (1957–1962), which introduced fixed-size memory blocks to overlay programs and extend limited core memory using drum storage, marking a pivotal milestone in virtual memory by treating main memory as a cache for slower backing store. Early implementations, such as in the IBM System/360 Model 67 (1967), popularized paging in commercial systems, evolving to handle growing memory demands; by the 1970s, Unix on the PDP-11 adopted paging with 1 KB pages in its Version 7 release (1979), influencing standards in subsequent operating systems like Linux and Windows. Today, paging remains integral to 64-bit architectures, supporting terabyte-scale virtual address spaces while accommodating variable page sizes (e.g., 2 MB or 1 GB huge pages) for performance optimization in workloads like databases or virtualization. Paging provides key benefits including the elimination of external fragmentation through non-contiguous allocation, simplified memory management by using fixed units, and the illusion of abundant memory via demand loading, enabling multitasking and larger applications without requiring all code and data in RAM simultaneously. It also enhances security and isolation by enforcing per-process address spaces and protection attributes, preventing unauthorized access between programs. However, drawbacks include internal fragmentation (wasted space within partially used pages), overhead from page table storage and TLB misses, and potential thrashing under heavy memory pressure, where excessive page faults degrade performance; these are mitigated by techniques like page replacement algorithms (e.g., LRU) and super-paging. Overall, paging underpins efficient resource utilization in contemporary computing environments.

Basic Concepts

Definition and Purpose

In paging-based memory management schemes used in modern operating systems, a page is defined as a fixed-size block of contiguous memory into which both physical and virtual address spaces are divided. This division allows the operating system to treat memory as a collection of uniform units, facilitating efficient handling of processes that may exceed available physical RAM. The primary purpose of pages is to enable virtual memory systems, which provide processes with the illusion of a large, contiguous address space despite physical memory constraints. By dividing memory into pages, the operating system can allocate non-contiguous physical frames to a process's virtual pages, abstracting away the limitations of RAM and allowing efficient swapping of inactive pages to secondary storage like disk. This approach supports demand paging, where only actively used pages are loaded into memory, thereby optimizing resource utilization and enabling multitasking without requiring contiguous allocation. Key characteristics of pages include their role as the fundamental unit for memory allocation, protection mechanisms (such as read/write/execute permissions), and data transfer operations between RAM and secondary storage. Pages are typically 4 KB in size on common architectures like x86, though this can vary, and they ensure that memory management operations occur in discrete, predictable chunks to minimize fragmentation and simplify hardware support via mechanisms like the memory management unit (MMU). For example, in a 32-bit virtual address space, a virtual address is split into a page number (the higher-order bits identifying which page) and an offset (the lower-order bits specifying the location within that page), allowing the operating system to map the page number to a physical frame via a page table.

Role in Virtual Memory Systems

In virtual memory systems, paging serves as a core mechanism for managing memory by dividing the virtual address space of a process into fixed-size units known as pages, while physical memory is similarly partitioned into page frames of the same size. These pages and frames enable a flexible mapping that allows the operating system to allocate non-contiguous physical memory to processes without the fragmentation issues associated with variable-sized allocations. The mappings between virtual pages and physical frames are stored in per-process page tables, which the operating system maintains to provide each process with the illusion of a large, contiguous, and private address space. This approach originated in early systems like the Atlas computer, where it was implemented as a one-level storage system to automate transfers between fast core memory and slower drum storage. Address translation in paging relies on decomposing a virtual address into two components: the virtual page number (VPN), which identifies the page within the virtual address space, and the offset, which specifies the byte position within that page. The MMU, a hardware component integrated into the CPU, uses the VPN to index the process's page table and retrieve the corresponding physical frame number (PFN); the offset is then appended to the PFN to form the physical address. This hardware-accelerated translation ensures efficient access while enforcing isolation between processes, as each operates within its own virtual address space. The MMU also verifies the validity of the mapping before proceeding, raising an exception if the page is not present or accessible. Paging provides key benefits to operating systems through techniques like demand paging, where pages are loaded into physical memory only upon first access, rather than pre-loading the entire address space. This is facilitated by a present bit in page table entries (PTEs) that indicates whether a page resides in memory; absent pages trigger a page fault handled by the OS to fetch the data from secondary storage. Additionally, copy-on-write optimization supports efficient process creation, such as during forking, by initially sharing pages between parent and child processes with read-only permissions; a private copy is created only when one process attempts to modify a page, minimizing unnecessary duplication. These mechanisms enhance resource utilization by supporting sparse address spaces and reducing memory overhead for common operations. A primary advantage of paging is its support for memory protection and sharing at the granularity of individual pages. Each PTE includes permission bits that specify access rights, such as read, write, or execute, allowing the OS to enforce fine-grained controls and prevent unauthorized access or modifications. For instance, the MMU checks these bits during translation and signals a fault for violations, enabling isolation of processes while permitting safe sharing of read-only pages, such as code segments or shared libraries, across multiple processes that map to the same physical frame. This per-page control promotes security and efficiency in multiprogrammed environments without requiring complex segment descriptors.

Page Size Fundamentals

Common Page Sizes Across Systems

The most common page size in modern computing systems is 4 KB, serving as the standard base size for architectures such as x86 and ARM, as well as operating systems like Linux and Windows on these platforms. This size balances memory management efficiency with hardware constraints, originating from the memory management unit (MMU) designs in these processors. For instance, the x86 architecture fixes 4 KB as its granule size, while ARM supports it as one of several options but defaults to it in many Linux distributions on ARM64. Operating system implementations often align with their primary hardware targets. Linux defaults to 4 KB pages on x86 systems, enabling fine-grained virtual memory mapping. Windows similarly uses 4 KB as the standard page size across x86 and x86-64 processors, with support for larger pages only in specific configurations. In contrast, macOS on Intel-based systems employs 4 KB pages, while Apple Silicon (ARM-based) macOS and iOS since the 64-bit transition (A7 and later) expose 16 KB pages to user space for improved performance in memory-intensive tasks. Hardware architecture significantly influences base page sizes. The RISC-V specification mandates a minimum of 4 KB for its Sv32, Sv39, and Sv48 virtual memory modes, ensuring compatibility with standard MMU implementations. PowerPC systems, particularly in 64-bit variants like ppc64, commonly use 64 KB as the default page size in Linux distributions such as Fedora and Debian, leveraging the architecture's support for larger granules to reduce translation overhead. While 4 KB remains the default in most general-purpose systems, larger "huge" pages are supported as non-default variations for optimizing specific workloads, such as databases or virtualization. Common huge page sizes include 2 MB and 1 GB on x86-64 Linux and Windows, with ARM and RISC-V offering analogous superpages like 2 MB or 32 MB depending on the base granule. These are typically enabled explicitly rather than as system defaults.

Retrieving Page Size Programmatically

In Unix-like operating systems adhering to the POSIX standard, the page size can be retrieved at runtime using the sysconf function with the _SC_PAGESIZE parameter, which returns the size of a page in bytes as a long integer. This approach is preferred over the older getpagesize syscall, which was deprecated in POSIX.1-2001 and removed in POSIX.1-2008, though it remains available in many implementations for backward compatibility and is equivalent to sysconf(_SC_PAGESIZE). The following C code snippet demonstrates the usage:

c

#include <unistd.h> #include <stdio.h> int main() { long page_size = sysconf(_SC_PAGESIZE); if (page_size == -1) { perror("sysconf failed"); return 1; } printf("Page size: %ld bytes\n", page_size); return 0; }

#include <unistd.h> #include <stdio.h> int main() { long page_size = sysconf(_SC_PAGESIZE); if (page_size == -1) { perror("sysconf failed"); return 1; } printf("Page size: %ld bytes\n", page_size); return 0; }

This method queries the system's default page size dynamically, allowing applications to adapt memory allocations without hardcoding values. On Windows systems, the page size is obtained via the GetSystemInfo API function from the Windows API, which populates a SYSTEM_INFO structure where the dwPageSize member holds the page size in bytes as a DWORD. This call also provides additional system details like processor architecture but focuses here on the page size for memory management purposes. The example below in C++ (compatible with C via appropriate includes) illustrates this:

cpp

#include <windows.h> #include <iostream> int main() { SYSTEM_INFO si; GetSystemInfo(&si); std::cout << "Page size: " << si.dwPageSize << " bytes" << std::endl; return 0; }

#include <windows.h> #include <iostream> int main() { SYSTEM_INFO si; GetSystemInfo(&si); std::cout << "Page size: " << si.dwPageSize << " bytes" << std::endl; return 0; }

For applications targeting multiple operating systems, cross-platform code often employs conditional compilation to invoke the appropriate OS-specific function, such as sysconf on POSIX systems or GetSystemInfo on Windows, potentially wrapped in a library or macro for portability. While the C standard library (<stdlib.h>) does not provide a direct query, headers like <unistd.h> on Unix-like systems facilitate this, and third-party libraries can abstract the differences for broader compatibility. These methods typically retrieve the system's default page size, assuming a uniform configuration; in environments supporting multiple page sizes, such as Linux with huge pages enabled, additional queries are needed, for instance, by parsing /proc/<pid>/smaps to inspect per-mapping page sizes like KernelPageSize or AnonymousHugePageSize. This limitation requires developers to handle variability explicitly for advanced memory usage.

Support for Multiple Page Sizes

Modern operating systems support multiple page sizes to optimize memory management, allowing the use of superpages or huge pages—typically 2 MB or 1 GB—alongside the standard 4 KB base pages for improved efficiency in handling large memory regions. This approach reduces the number of page table entries required for large contiguous memory allocations, thereby minimizing translation lookaside buffer (TLB) pressure in scenarios with extensive memory access patterns. In Linux, support for multiple page sizes is implemented through two primary mechanisms: the HugeTLB filesystem for static huge pages and Transparent Huge Pages (THP) for dynamic allocation. HugeTLB enables explicit reservation of huge pages via kernel parameters such as /proc/sys/vm/nr_hugepages, which specifies the number of 2 MB or 1 GB pages to preallocate at boot or runtime, ensuring availability for applications without fragmentation risks if set early. THP, introduced to automate huge page usage, promotes contiguous 4 KB pages to 2 MB (or larger with multi-size THP) for anonymous memory and tmpfs/shmem mappings, operating in modes like "always" for system-wide enabling or "madvise" for application-specific hints. Userspace applications can allocate huge pages using the mmap system call with the MAP_HUGETLB flag, specifying the desired size via MAP_HUGETLB | (size << MAP_HUGE_SHIFT). Windows provides large page support primarily for 64-bit server applications, allowing allocations of 2 MB pages (or larger, depending on hardware) alongside standard pages to enhance performance in memory-intensive workloads. To use large pages, processes must first acquire the SeLockMemoryPrivilege privilege using AdjustTokenPrivileges, then allocate memory via VirtualAlloc with the MEM_LARGE_PAGES flag, ensuring the allocation size and alignment are multiples of the minimum large page size obtained from GetLargePageMinimum. This non-pageable memory is locked in physical RAM, avoiding paging overhead but requiring early allocation to mitigate fragmentation. These mechanisms are particularly beneficial in use cases involving databases and virtual machines, where large memory footprints benefit from fewer TLB misses—for instance, Oracle Database leverages HugeTLB pages to manage large shared global areas (SGAs) more efficiently, reducing virtual memory overhead. In virtualized environments, such as those using KVM on Linux, huge pages back guest VM memory to accelerate address translation and lower latency for workloads like PostgreSQL servers handling terabyte-scale data.

Page Size Trade-offs

Effects on Page Table Overhead

Page tables in virtual memory systems are hierarchical data structures that map virtual pages to physical frames, with the structure varying by architecture. In the IA-32 architecture using 32-bit paging, a two-level structure consists of a page directory followed by page tables, where each level typically holds 1024 entries for 4 KB pages. Larger page sizes, such as 4 MB supported via page size extensions, allow directory entries to directly map pages, eliminating the need for the lower-level page tables and thereby reducing the overall structure depth. The number of page table entries required scales inversely with page size, as determined by dividing the virtual address space by the page size. For a 32-bit virtual address space of 4 GB (2^{32} bytes) with 4 KB (2^{12} bytes) pages, this yields approximately 1,048,576 entries (2^{20}). Number of page table entries=Virtual address spacePage size=232212=2201,048,576\text{Number of page table entries} = \frac{\text{Virtual address space}}{\text{Page size}} = \frac{2^{32}}{2^{12}} = 2^{20} \approx 1{,}048{,}576 In contrast, using 2 MB (2^{21} bytes) pages reduces the entries to 2,048 (2^{11}), a factor of 512 fewer due to the larger granularity. Each page table entry typically consumes 4 bytes in 32-bit systems or 8 bytes in 64-bit systems, leading to significant memory overhead for smaller pages. For the 4 GB example with 4 KB pages and 4-byte entries, the page tables alone require about 4 MB (1,048,576 × 4 bytes), excluding the page directory. Switching to 2 MB pages cuts this to roughly 8 KB for the necessary directory entries (2,048 × 4 bytes, adjusted for multi-level allocation), saving approximately 4 MB of RAM per process. In multi-level paging, such as the four-level structure in Intel 64 architecture, page size influences the size and allocation of intermediate directories, including the page directory pointer table entries (PDPTEs). Each level uses 512 entries of 8 bytes for 4 KB pages, supporting a 48-bit virtual address space, but larger pages like 1 GB map directly at higher levels (e.g., via PDPT entries), minimizing the number of populated tables and associated overhead. This hierarchical approach ensures that only used portions of the address space consume memory, with larger pages further optimizing by collapsing levels.

Influence on TLB Performance

The Translation Lookaside Buffer (TLB) is a hardware cache that stores recent page table entries to accelerate virtual-to-physical address translations in paging systems. Modern CPU designs typically feature TLBs with a fixed number of entries, ranging from 64 to 1024 depending on the level (L1 or L2) and processor architecture. Page size directly influences TLB efficiency by determining how much virtual memory a given number of TLB entries can cover. With smaller pages, such as the common 4 KB size, a 64-entry TLB provides a "reach" of only 256 KB (64 entries × 4 KB), requiring frequent evictions and reloads for applications spanning larger memory footprints. In contrast, using larger pages like 2 MB allows the same 64-entry TLB to cover 128 MB, significantly reducing the need for TLB flushes in memory-intensive workloads. This effect arises because each TLB entry maps an entire page, so larger pages consolidate more translations into fewer entries, minimizing contention within the fixed-size buffer. A TLB miss incurs a substantial performance penalty, as it triggers a full page table walk across multiple levels of the page table hierarchy, often costing hundreds of CPU cycles—typically 100 to 120 cycles on average in contemporary systems. Larger page sizes mitigate this by lowering miss rates; for instance, in large-scale applications, employing huge pages can reduce TLB misses by up to 81% compared to base 4 KB pages, thereby avoiding repeated walks and improving overall execution speed. The key metric for quantifying this impact is TLB reach, defined as the product of the number of TLB entries and the page size, which represents the maximum contiguous virtual address space that can be translated without a miss. Huge pages dramatically extend this reach—for example, 1 GB pages with a 1024-entry L2 TLB yield a 1 TB coverage—enabling better performance in data-center and scientific computing scenarios where working sets exceed traditional limits. Systems supporting multiple page sizes, such as x86-64 architectures, leverage this to dynamically select larger pages for hot memory regions, further optimizing TLB utilization.

Internal Fragmentation Considerations

Internal fragmentation occurs in paging systems when the fixed-size pages allocated to a process or data structure leave unused space within the page because the actual memory requirement is smaller than the page size. For instance, if a process requires 3 KB of memory but the system uses 4 KB pages, 1 KB remains wasted within the allocated page. This waste arises because memory allocations must align to page boundaries, rounding up to the nearest full page. The average amount of internal fragmentation per process is typically half the page size, assuming a uniform distribution of allocation sizes modulo the page size. To derive this, consider the remainder rr when the allocation size is divided by the page size pp, where 0r<p0 \leq r < p. The waste for each allocation is prp - r if r>0r > 0, or 0 if r=0r = 0. Over a uniform distribution, the expected waste is the integral 0p(pr)1pdr=p2\int_0^p (p - r) \cdot \frac{1}{p} \, dr = \frac{p}{2}. This quantification highlights how larger page sizes amplify average waste, potentially consuming up to 50% of allocated memory on average for mismatched sizes. Several factors influence the extent of internal fragmentation, primarily the size of a process's working set—the active portion of memory it uses. Processes with small, variable working sets experience higher fragmentation with larger pages, as much of the page goes unused. Conversely, applications with large, contiguous memory needs, such as databases or scientific computations, suffer less relative waste. The choice of page size thus balances fragmentation against other overheads, where smaller pages minimize waste but introduce more pages overall. To mitigate internal fragmentation, systems can employ variable page sizes or multiple supported page granularities, allowing allocations to better match request sizes through hierarchical or Fibonacci-based block schemes that reduce average waste by up to 30-50% compared to fixed sizes. For large contiguous allocations, huge pages (e.g., 2 MB or 1 GB) are used selectively to cover bulk needs efficiently, though they risk higher fragmentation if the data does not fill them completely; this approach is common in modern operating systems like Linux for performance-critical workloads.

Impact on Disk I/O Efficiency

In virtual memory systems, pages represent the atomic unit of data transfer between main memory and secondary storage during swapping operations, where the operating system moves pages to and from disk to manage memory pressure. Larger page sizes reduce the total number of I/O requests by consolidating more data per operation, thereby improving disk utilization and throughput as fixed overheads like seek times and rotational delays are amortized over greater data volumes. For instance, systems with small pages, such as the 512-byte pages in VAX/VMS, historically faced inefficient swapping due to excessive small transfers, prompting optimizations like page clustering to batch multiple pages into larger writes for better disk performance. This leads to a key trade-off: smaller pages, such as the common 4 KB size, support finer-grained swapping by loading only the minimally required data on a page fault, which can lower per-fault latency and minimize unnecessary data movement, but they increase the frequency of disk seeks and thus potential bottlenecks in throughput-heavy scenarios. In contrast, larger pages like 64 KB decrease seek overhead and enhance sequential I/O efficiency, though they may transfer excess data for localized faults, impacting responsiveness in random access patterns. These dynamics are particularly evident in demand paging, where pages are fetched on-demand from disk. Page size also influences file system operations through alignment with underlying block sizes, optimizing mapped file I/O and reducing fragmentation at the storage layer. For example, in Linux's ext4 file system, the default 4 KB block size aligns directly with the standard memory page size, avoiding read-modify-write cycles where partial page updates would otherwise require reading an entire block, modifying it in memory, and rewriting it to disk. Such misalignment, common when page and block sizes differ, incurs extra I/O overhead and latency, as the system must handle additional disk accesses to maintain data integrity. Overall, performance metrics underscore that larger pages boost I/O bandwidth by enabling sustained high-transfer rates suitable for bulk operations, while smaller pages excel in latency-sensitive environments for partial or sporadic updates, with empirical evaluations showing throughput gains of up to several times in large-page configurations for I/O-bound workloads.

Historical Development

Origins in Early Computing

The concept of paging in computer memory originated in the late 1950s and early 1960s as a solution to the limitations of physical memory in early computing systems. The Atlas computer, developed at the University of Manchester and operational from December 1962, introduced one of the first practical implementations of virtual memory through paging. This system treated core memory (16,000 words) and drum storage (96,000 words) as a unified "one-level store," using fixed-size pages of 512 words to automatically transfer data between fast and slow storage media. The motivation stemmed from the need to eliminate manual data movement by programmers, which consumed significant time and reduced productivity by up to threefold, allowing the system to present a large virtual address space of up to 1 million words despite limited physical resources. Paging was managed via page address registers and an associative memory mechanism, with a learning algorithm to predict and preload least-used pages, though pages were fixed to hardware-defined block sizes without initial support for variable granularity. Building on Atlas's innovations, the Multics (Multiplexed Information and Computing Service) project, initiated in 1965 as a collaboration between MIT's Project MAC, Bell Laboratories, and General Electric, further advanced paging as a core component of virtual memory for time-sharing systems. Multics employed paging alongside segmentation to enable flexible dynamic memory allocation, dividing the address space into pages of either 64 words or 1,024 words, which allowed the operating system to treat secondary storage as an extension of main memory. The primary motivations were to overcome memory constraints in multi-user batch processing environments and to support larger virtual address spaces than available physical RAM, facilitating interactive computing for scientific and business applications without requiring entire program swaps. Early Multics paging was hardware-dependent on fixed block sizes from the GE-645 computer and lacked full demand paging in initial designs, relying instead on supervisor-managed transfers. A key commercial milestone came with IBM's System/360 Model 67, announced in 1965 as an extension of the System/360 family launched in 1964, marking the first production system with built-in paging support for virtual memory. This model used 4,096-byte (4 KB) pages, organized into segments of up to 1 million bytes, with dynamic address translation hardware to map virtual addresses to physical locations via page and segment tables. Driven by the demands of time-sharing and batch systems with constrained RAM, it enabled virtual spaces far exceeding physical memory—up to 16 million bytes—by paging only active portions to auxiliary storage, thus improving multiprogramming efficiency. Like its predecessors, early paging on the Model 67 was tied to fixed hardware block sizes, with no provision for demand paging until software enhancements in the Time Sharing System (TSS).

Evolution of Paging Mechanisms

In the late 1970s, the VAX-11/780 system, introduced by Digital Equipment Corporation in 1977 alongside the VAX/VMS operating system, utilized 512-byte pages for virtual memory management, enabling a 31-bit virtual address space divided into pages of this size. This design balanced memory efficiency with the hardware capabilities of the era, supporting demand paging in a multiprogramming environment. During the same period, Unix implementations on the PDP-11 series adopted a 1 KB page size to accommodate the limited memory and I/O constraints of minicomputers, facilitating efficient swapping and process isolation. By the 1980s, paging mechanisms evolved with the Berkeley Software Distribution (BSD) variant of Unix, which standardized on 4 KB pages to reduce page table overhead and improve TLB utilization on systems like the VAX and Sun workstations. This shift from smaller pages addressed growing address space needs and hardware advancements, such as larger caches and faster memory buses, while maintaining compatibility with earlier Unix versions. Entering the 1990s, the x86 architecture standardized demand paging with 4 KB pages, as defined in the Intel 80386 processor documentation and adopted by operating systems like Windows NT and Linux, providing a de facto norm for 32-bit systems. To support larger physical memory beyond 4 GB, Physical Address Extension (PAE) was introduced in the Intel Pentium Pro processor in 1995, extending physical addressing to 36 bits while retaining the 4 KB page granularity for compatibility. In the 2000s, Linux introduced support for huge pages in kernel version 2.6.10 around 2004, allowing 2 MB or larger pages via the hugetlbfs filesystem to mitigate TLB contention in high-memory workloads like databases. Concurrently, the ARM64 architecture, defined in the ARMv8 specification released in 2011, provided variable page sizes including 4 KB, 16 KB, and 64 KB, enabling flexible configurations for embedded and server applications through configurable translation table levels. Security concerns drove further innovations, such as kernel page-table isolation (KPTI), implemented in Linux kernel 4.15 in early 2018 as a mitigation for the Meltdown vulnerability disclosed in January 2018, which unmaps kernel memory from user-space page tables during non-kernel execution to prevent speculative access. Recent trends reflect a shift toward larger default page sizes in mobile and embedded systems, exemplified by Android's support for 16 KB pages starting in Android 15 (2024), optimizing for ARM hardware with reduced TLB pressure and improved cache efficiency. Hardware advancements, particularly in ARM-based processors, increasingly favor 64 KB pages for power-sensitive environments, as seen in Neoverse cores, to enhance performance in IoT and edge computing while supporting multiple page sizes for versatility. In the 2020s, security enhancements in paging mechanisms have increasingly emphasized encrypted pages to provide page-level isolation and resist side-channel attacks. AMD's Secure Encrypted Virtualization (SEV) technology encrypts memory at the hardware level, protecting virtual machine pages from hypervisor access and mitigating risks such as ciphertext side-channel leaks during execution. Similarly, Intel's Trust Domain Extensions (TDX) enable confidential computing by isolating trust domains within virtual machines, using hardware-accelerated memory encryption to safeguard page contents against unauthorized access and side-channel vulnerabilities like those exploited in Spectre-era attacks. These features, introduced in the early 2020s, have become integral to cloud environments, with SEV-SNP and TDX variants enhancing resistance to page-fault and cache-based side channels through integrity checks and dynamic obfuscation techniques. Efficiency trends in paging have shifted toward larger and variable page sizes to optimize for modern workloads, particularly in AI and hyperscale computing. In RISC-V architectures, support for 64 KB base pages has gained traction for AI inference tasks. Linux kernel patches proposed since 2023 seek to enable this 64 KB configuration, breaking prior 4 KB limitations to better suit memory-intensive AI frameworks. Hyperscalers like AWS have adopted variable page sizes in their Graviton processors, allowing configurations such as 64 KB pages in ARM-based EC2 instances to enhance performance for diverse cloud workloads without fixed-size constraints. Looking toward future directions, paging systems are integrating with technologies like Compute Express Link (CXL) to enable memory pooling across devices, where coherent interconnects allow shared page access in disaggregated environments, reducing latency in tiered memory setups. Experimental systems are exploring sub-page granularity for migration and allocation to minimize fragmentation in heterogeneous memory, such as by migrating data at finer-than-page levels in huge-page contexts or using adaptive subrelease policies in non-moving allocators. As of 2025, huge pages have seen widespread adoption in cloud virtual machines to mitigate TLB overheads, with platforms like VMware ESXi supporting 2 MB or larger pages and Oracle Cloud Infrastructure configuring them up to a 60% cap of total memory in performance-critical VMs. Standards such as UEFI continue to specify a 4 KB base page size for compatibility, ensuring runtime services map memory into operating system address spaces at this granularity across x86 and ARM systems.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.