Hubbry Logo
Memory pagingMemory pagingMain
Open search
Memory paging
Community hub
Memory paging
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Memory paging
Memory paging
from Wikipedia

In computer operating systems, memory paging is a memory management scheme that allows the physical memory used by a program to be non-contiguous.[1] This also helps avoid the problem of memory fragmentation and requiring compaction to reduce fragmentation.

Paging is often combined with the related technique of allocating and freeing page frames and storing pages on and retrieving them from secondary storage[a] in order to allow the aggregate size of the address spaces to exceed the physical memory of the system.[2] For historical reasons, this technique is sometimes referred to as swapping.

When combined with virtual memory, it is known as paged virtual memory. In this scheme, the operating system retrieves data from secondary storage in blocks of the same size (pages). Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

Hardware support is necessary for efficient translation of logical addresses to physical addresses. As such, paged memory functionality is usually hardwired into a CPU through its Memory Management Unit (MMU) or Memory Protection Unit (MPU), and separately enabled by privileged system code in the operating system's kernel. In CPUs implementing the x86 instruction set architecture (ISA) for instance, the memory paging is enabled via the CR0 control register.

History

[edit]

In the 1960s, swapping was an early memory management technique. An entire program or entire segment would be "swapped out" (or "rolled out") from Random-access memory (RAM) to disk or drum, and another one would be swapped in (or rolled in).[3][4] A swapped-out program would be current but its execution would be suspended while the RAM was in use by another program; a program with a swapped-out segment could continue running until it needed that segment, at which point it would be suspended until the segment was swapped in.

A program might include multiple overlays that occupy the same memory at different times. Overlays are not a method of paging RAM to secondary storage[a] but merely of minimizing the program's RAM use. Subsequent architectures used memory segmentation, and individual program segments became the units exchanged between secondary storage and RAM. A segment was the program's entire code segment or data segment, or sometimes other large data structures. These segments had to be contiguous when resident in RAM, requiring additional computation and movement to remedy fragmentation.[5]

Ferranti's Atlas, and the Atlas Supervisor developed at the University of Manchester[6] (1962), was the first system to implement memory paging. Subsequent early machines, and their operating systems, supporting paging include the IBM M44/44X and its MOS operating system (1964),[7] the SDS 940[8] and the Berkeley Timesharing System (1966), a modified IBM System/360 Model 40 and the CP-40 operating system (1967), the IBM System/360 Model 67 and operating systems such as TSS/360, CP/CMS and the Michigan Terminal System (MTS) (1967), the RCA 70/46 and the Time Sharing Operating System (TSOS) (1967), the GE 645 and Multics (1969), and the DEC PDP-10 with added BBN-designed paging hardware and the TENEX operating system (1969).

Those machines, and subsequent machines supporting memory paging, use either a set of page address registers or in-memory page tables[d] to allow the processor to operate on arbitrary pages anywhere in RAM as a seemingly contiguous logical address space. These pages became the units exchanged between secondary storage[a] and RAM.

Page faults

[edit]

When a process tries to reference a page not currently mapped to a page frame in RAM, the processor treats this invalid memory reference as a page fault and transfers control from the program to the operating system. The operating system must:

  1. Determine whether a stolen page frame still contains an unmodified copy of the page; if so, use that page frame.
  2. Otherwise, obtain an empty page frame in RAM to use as a container for the data, and:
    • Determine whether the page was ever initialized
    • If so determine the location of the data on secondary storage[a].
    • Load the required data into the available page frame.
  3. Update the page table to refer to the new page frame.
  4. Return control to the program, transparently retrying the instruction that caused the page fault.

When all page frames are in use, the operating system must select a page frame to reuse for the page the program now needs. If the evicted page frame was dynamically allocated by a program to hold data, or if a program modified it since it was read into RAM (in other words, if it has become "dirty"), it must be written out to secondary storage before being freed. If a program later references the evicted page, another page fault occurs and the page must be read back into RAM.

The method the operating system uses to select the page frame to reuse, which is its page replacement algorithm, affects efficiency. The operating system predicts the page frame least likely to be needed soon, often through the least recently used (LRU) algorithm or an algorithm based on the program's working set. To further increase responsiveness, paging systems may predict which pages will be needed soon, preemptively loading them into RAM before a program references them, and may steal page frames from pages that have been unreferenced for a long time, making them available. Some systems clear new pages to avoid data leaks that compromise security; some set them to installation defined or random values to aid debugging.

Page fetching techniques

[edit]

Demand paging

[edit]

When pure demand paging is used, pages are loaded only when they are referenced. A program from a memory mapped file begins execution with none of its pages in RAM. As the program commits page faults, the operating system copies the needed pages from a file, e.g., memory-mapped file, paging file, or a swap partition containing the page data into RAM.

Anticipatory paging

[edit]

Some systems use only demand paging—waiting until a page is actually requested before loading it into RAM.

Other systems attempt to reduce latency by guessing which pages not in RAM are likely to be needed soon, and pre-loading such pages into RAM, before that page is requested. (This is often in combination with pre-cleaning, which guesses which pages currently in RAM are not likely to be needed soon, and pre-writing them out to storage).

When a page fault occurs, anticipatory paging systems will not only bring in the referenced page, but also other pages that are likely to be referenced soon. A simple anticipatory paging algorithm will bring in the next few consecutive pages even though they are not yet needed (a prediction using locality of reference); this is analogous to a prefetch input queue in a CPU. Swap prefetching will prefetch recently swapped-out pages if there are enough free pages for them.[9]

If a program ends, the operating system may delay freeing its pages, in case the user runs the same program again.

Some systems allow application hints; the application may request that a page be made available and continue without delay.

Page replacement techniques

[edit]

Free page queue, stealing, and reclamation

[edit]

The free page queue is a list of page frames that are available for assignment. Preventing this queue from being empty minimizes the computing necessary to service a page fault. Some operating systems periodically look for pages that have not been recently referenced and then free the page frame and add it to the free page queue, a process known as "page stealing". Some operating systems[e] support page reclamation; if a program commits a page fault by referencing a page that was stolen, the operating system detects this and restores the page frame without having to read the contents back into RAM.

Pre-cleaning

[edit]

The operating system may periodically pre-clean dirty pages: write modified pages back to secondary storage[a] even though they might be further modified. This minimizes the amount of cleaning needed to obtain new page frames at the moment a new program starts or a new data file is opened, and improves responsiveness. (Unix operating systems periodically use sync to pre-clean all dirty pages; Windows operating systems use "modified page writer" threads.)

Some systems allow application hints; the application may request that a page be cleared or paged out and continue without delay.

Thrashing

[edit]

After completing initialization, most programs operate on a small number of code and data pages compared to the total memory the program requires. The pages most frequently accessed are called the working set.

When the working set is a small percentage of the system's total number of pages, virtual memory systems work most efficiently and an insignificant amount of computing is spent resolving page faults. As the working set grows, resolving page faults remains manageable until the growth reaches a critical point. Then faults go up dramatically and the time spent resolving them overwhelms time spent on the computing the program was written to do. This condition is referred to as thrashing. Thrashing occurs on a program that works with huge data structures, as its large working set causes continual page faults that drastically slow down the system. Satisfying page faults may require freeing pages that will soon have to be re-read from secondary storage.[a] "Thrashing" is also used in contexts other than virtual memory systems; for example, to describe cache issues in computing or silly window syndrome in networking.

A worst case might occur on VAX processors. A single MOVL crossing a page boundary could have a source operand using a displacement deferred addressing mode, where the longword containing the operand address crosses a page boundary, and a destination operand using a displacement deferred addressing mode, where the longword containing the operand address crosses a page boundary, and the source and destination could both cross page boundaries. This single instruction references ten pages; if not all are in RAM, each will cause a page fault. As each fault occurs the operating system needs to go through the extensive memory management routines perhaps causing multiple I/Os which might include writing other process pages to disk and reading pages of the active process from disk. If the operating system could not allocate ten pages to this program, then remedying the page fault would discard another page the instruction needs, and any restart of the instruction would fault again.

To decrease excessive paging and resolve thrashing problems, a user can increase the number of pages available per program, either by running fewer programs concurrently or increasing the amount of RAM in the computer.

Sharing

[edit]

In multi-programming or in a multi-user environment, many users may execute the same program, written so that its code and data are in separate pages. To minimize RAM use, all users share a single copy of the program. Each process's page table is set up so that the pages that address code point to the single shared copy, while the pages that address data point to different physical pages for each process.

Different programs might also use the same libraries. To save space, only one copy of the shared library is loaded into physical memory. Programs which use the same library have virtual addresses that map to the same pages (which contain the library's code and data). When programs want to modify the library's code, they use copy-on-write, so memory is only allocated when needed.

Shared memory is an efficient means of communication between programs. Programs can share pages in memory, and then write and read to exchange data.

Implementations

[edit]

Ferranti Atlas

[edit]

The first computer to support paging was the supercomputer Atlas,[10][11][12] jointly developed by Ferranti, the University of Manchester and Plessey in 1963. The machine had an associative (content-addressable) memory with one entry for each 512 word page. The Supervisor[13] handled non-equivalence interruptions[f] and managed the transfer of pages between core and drum in order to provide a one-level store[14] to programs.

Microsoft Windows

[edit]

Windows 3.x and Windows 9x

[edit]

Paging has been a feature of Microsoft Windows since Windows 3.0 in 1990. Windows 3.x creates a hidden file named 386SPART.PAR or WIN386.SWP for use as a swap file. It is generally found in the root directory, but it may appear elsewhere (typically in the WINDOWS directory). Its size depends on how much swap space the system has (a setting selected by the user under Control Panel → Enhanced under "Virtual Memory"). If the user moves or deletes this file, a blue screen will appear the next time Windows is started, with the error message "The permanent swap file is corrupt". The user will be prompted to choose whether or not to delete the file (even if it does not exist).

Windows 95, Windows 98 and Windows Me use a similar file, and the settings for it are located under Control Panel → System → Performance tab → Virtual Memory. Windows automatically sets the size of the page file to start at 1.5× the size of physical memory, and expand up to 3× physical memory if necessary. If a user runs memory-intensive applications on a system with low physical memory, it is preferable to manually set these sizes to a value higher than default.

Windows NT

[edit]

The file used for paging in the Windows NT family is pagefile.sys. The default location of the page file is in the root directory of the partition where Windows is installed. Windows can be configured to use free space on any available drives for page files. It is required, however, for the boot partition (i.e., the drive containing the Windows directory) to have a page file on it if the system is configured to write either kernel or full memory dumps after a Blue Screen of Death. Windows uses the paging file as temporary storage for the memory dump. When the system is rebooted, Windows copies the memory dump from the page file to a separate file and frees the space that was used in the page file.[15]

Fragmentation

[edit]

In the default configuration of Windows, the page file is allowed to expand beyond its initial allocation when necessary. If this happens gradually, it can become heavily fragmented which can potentially cause performance problems.[16] The common advice given to avoid this is to set a single "locked" page file size so that Windows will not expand it. However, the page file only expands when it has been filled, which, in its default configuration, is 150% of the total amount of physical memory.[17] Thus the total demand for page file-backed virtual memory must exceed 250% of the computer's physical memory before the page file will expand.

The fragmentation of the page file that occurs when it expands, is temporary. As soon as the expanded regions are no longer in use (at the next reboot, if not sooner) the additional disk space allocations are freed and the page file is back to its original state.

Locking a page file size can be problematic if a Windows application requests more memory than the total size of physical memory and the page file, leading to failed requests to allocate memory that may cause applications and system processes to fail. Also, the page file is rarely read or written in sequential order, so the performance advantage of having a completely sequential page file is minimal. However, a large page file generally allows the use of memory-heavy applications, with no penalties besides using more disk space. While a fragmented page file may not be an issue by itself, fragmentation of a variable size page file will over time create several fragmented blocks on the drive, causing other files to become fragmented. For this reason, a fixed-size contiguous page file is better, providing that the size allocated is large enough to accommodate the needs of all applications.

The required disk space may be easily allocated on systems with more recent specifications (i.e. a system with 3 GB of memory having a 6 GB fixed-size page file on a 750 GB disk drive, or a system with 6 GB of memory and a 16 GB fixed-size page file and 2 TB of disk space). In both examples, the system uses about 0.8% of the disk space with the page file pre-extended to its maximum.

Defragmenting the page file is also occasionally recommended to improve performance when a Windows system is chronically using much more memory than its total physical memory.[18] This view ignores the fact that, aside from the temporary results of expansion, the page file does not become fragmented over time. In general, performance concerns related to page file access are much more effectively dealt with by adding more physical memory.

Unix and Unix-like systems

[edit]

Unix systems, and other Unix-like operating systems, use the term "swap" to describe the act of substituting disk space for RAM when physical RAM is full.[19] In some of those systems, it is common to dedicate an entire partition of a hard disk to swapping. These partitions are called swap partitions. Many systems have an entire hard drive dedicated to swapping, separate from the data drive(s), containing only a swap partition. A hard drive dedicated to swapping is called a "swap drive" or a "scratch drive" or a "scratch disk". Some of those systems only support swapping to a swap partition; others also support swapping to files.

Linux

[edit]

The Linux kernel supports a virtually unlimited number of swap backends (devices or files), and also supports assignment of backend priorities. When the kernel swaps pages out of physical memory, it uses the highest-priority backend with available free space. If multiple swap backends are assigned the same priority, they are used in a round-robin fashion (which is somewhat similar to RAID 0 storage layouts), providing improved performance as long as the underlying devices can be efficiently accessed in parallel.[20]

Swap files and partitions
[edit]

From the end-user perspective, swap files in versions 2.6.x and later of the Linux kernel are virtually as fast as swap partitions; the limitation is that swap files should be contiguously allocated on their underlying file systems. To increase performance of swap files, the kernel keeps a map of where they are placed on underlying devices and accesses them directly, thus bypassing the cache and avoiding filesystem overhead.[21][22] When residing on HDDs, which are rotational magnetic media devices, one benefit of using swap partitions is the ability to place them on contiguous HDD areas that provide higher data throughput or faster seek time. However, the administrative flexibility of swap files can outweigh certain advantages of swap partitions. For example, a swap file can be placed on any mounted file system, can be set to any desired size, and can be added or changed as needed. Swap partitions are not as flexible; they cannot be enlarged without using partitioning or volume management tools, which introduce various complexities and potential downtimes.

Swappiness
[edit]

Swappiness is a Linux kernel parameter that controls the relative weight given to swapping out of runtime memory, as opposed to dropping pages from the system page cache, whenever a memory allocation request cannot be met from free memory. Swappiness can be set to a value from 0 to 200.[23] A low value causes the kernel to prefer to evict pages from the page cache while a higher value causes the kernel to prefer to swap out "cold" memory pages. The default value is 60; setting it higher can cause high latency if cold pages need to be swapped back in (when interacting with a program that had been idle for example), while setting it lower (even 0) may cause high latency when files that had been evicted from the cache need to be read again, but will make interactive programs more responsive as they will be less likely to need to swap back cold pages. Swapping can also slow down HDDs further because it involves a lot of random writes, while SSDs do not have this problem. Certainly the default values work well in most workloads, but desktops and interactive systems for any expected task may want to lower the setting while batch processing and less interactive systems may want to increase it.[24]

Swap death
[edit]

When the system memory is highly insufficient for the current tasks and a large portion of memory activity goes through a slow swap, the system can become practically unable to execute any task, even if the CPU is idle. When every process is waiting on the swap, the system is considered to be in swap death.[25][26]

Swap death can happen due to incorrectly configured memory overcommitment.[27][28][29]

The original description of the "swapping to death" problem relates to the X server. If code or data used by the X server to respond to a keystroke is not in main memory, then if the user enters a keystroke, the server will take one or more page faults, requiring those pages to read from swap before the keystroke can be processed, slowing the response to it. If those pages do not remain in memory, they will have to be faulted in again to handle the next keystroke, making the system practically unresponsive even if it's actually executing other tasks normally.[30]

macOS

[edit]

macOS uses multiple swap files. The default (and Apple-recommended) installation places them on the root partition, though it is possible to place them instead on a separate partition or device.[31]

AmigaOS 4

[edit]

AmigaOS 4.0 introduced a new system for allocating RAM and defragmenting physical memory. It still uses flat shared address space that cannot be defragmented. It is based on slab allocation and paging memory that allows swapping. Paging was implemented in AmigaOS 4.1. It can lock up the system if all physical memory is used up.[32] Swap memory could be activated and deactivated, allowing the user to choose to use only physical RAM.

Performance

[edit]

The backing store for a virtual memory operating system is typically many orders of magnitude slower than RAM. Hard disks, for instance, introduce several milliseconds delay before the reading or writing begins. Therefore, it is desirable to reduce or eliminate swapping, where practical. Some operating systems offer settings to influence the kernel's decisions.

  • Linux offers the /proc/sys/vm/swappiness parameter, which changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache.
  • Windows 2000, XP, and Vista offer the DisablePagingExecutive registry setting, which controls whether kernel-mode code and data can be eligible for paging out.
  • Mainframe computers frequently used head-per-track disk drives or drums for page and swap storage to eliminate seek time, and several technologies[33] to have multiple concurrent requests to the same device in order to reduce rotational latency.
  • Flash memory has a finite number of erase-write cycles (see limitations of flash memory), and the smallest amount of data that can be erased at once might be very large,[g] seldom coinciding with pagesize. Therefore, flash memory may wear out quickly if used as swap space under tight memory conditions. For this reason, mobile and embedded operating systems (such as Android) may not use swap space. On the attractive side, flash memory is practically delayless compared to hard disks, and not volatile as RAM chips. Schemes like ReadyBoost and Intel Turbo Memory are made to exploit these characteristics.

Many Unix-like operating systems (for example AIX, Linux, and Solaris) allow using multiple storage devices for swap space in parallel, to increase performance.

Swap space size

[edit]

In some older virtual memory operating systems, space in swap backing store is reserved when programs allocate memory for runtime data. Operating system vendors typically issue guidelines about how much swap space should be allocated.

Physical and virtual address space sizes

[edit]

Paging is one way of allowing the size of the addresses used by a process, which is the process's "virtual address space" or "logical address space", to be different from the amount of main memory actually installed on a particular computer, which is the physical address space.

Main memory smaller than virtual memory

[edit]

In most systems, the size of a process's virtual address space is much larger than the available main memory.[35] For example:

  • The address bus that connects the CPU to main memory may be limited. The i386SX CPU's 32-bit internal addresses can address 4 GB, but it has only 24 pins connected to the address bus, limiting installed physical memory to 16 MB. There may be other hardware restrictions on the maximum amount of RAM that can be installed.
  • The maximum memory might not be installed because of cost, because the model's standard configuration omits it, or because the buyer did not believe it would be advantageous.
  • Sometimes not all internal addresses can be used for memory anyway, because the hardware architecture may reserve large regions for I/O or other features.

Main memory the same size as virtual memory

[edit]

A computer with true n-bit addressing may have 2n addressable units of RAM installed. An example is a 32-bit x86 processor with 4 GB and without Physical Address Extension (PAE). In this case, the processor is able to address all the RAM installed and no more.

However, even in this case, paging can be used to support more virtual memory than physical memory. For instance, many programs may be running concurrently. Together, they may require more physical memory than can be installed on the system, but not all of it will have to be in RAM at once. A paging system makes efficient decisions on which memory to relegate to secondary storage, leading to the best use of the installed RAM.

In addition the operating system may provide services to programs that envision a larger memory, such as files that can grow beyond the limit of installed RAM. Not all of the file can be concurrently mapped into the address space of a process, but the operating system might allow regions of the file to be mapped into the address space, and unmapped if another region needs to be mapped in.

Main memory larger than virtual address space

[edit]

A few computers have a main memory larger than the virtual address space of a process, such as the Magic-1,[35] some PDP-11 machines, and some systems using 32-bit x86 processors with Physical Address Extension. This nullifies a significant advantage of paging, since a single process cannot use more main memory than the amount of its virtual address space. Such systems often use paging techniques to obtain secondary benefits:

  • The "extra memory" can be used in the page cache to cache frequently used files and metadata, such as directory information, from secondary storage.
  • If the processor and operating system support multiple virtual address spaces, the "extra memory" can be used to run more processes. Paging allows the cumulative total of virtual address spaces to exceed physical main memory.
  • A process can store data in memory-mapped files on memory-backed file systems, such as the tmpfs file system or file systems on a RAM drive, and map files into and out of the address space as needed.
  • A set of processes may still depend upon the enhanced security features page-based isolation may bring to a multitasking environment.

The size of the cumulative total of virtual address spaces is still limited by the amount of secondary storage available.

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Memory paging is a fundamental scheme in operating systems that implements by dividing both the of a and the physical memory into fixed-size blocks known as pages and , respectively, typically ranging from 4 KB to 8 KB in size. This allows processes to use a large, non-contiguous that exceeds available physical memory, with pages mapped to via a that translates virtual addresses (consisting of a page number and offset) into physical addresses. When a accesses a page not currently in physical memory, a occurs, prompting the operating system to load the required page from disk into a frame, potentially evicting another page if necessary—a known as demand paging. Key advantages of memory paging include the elimination of external fragmentation, support for efficient and sharing among , and the ability to provide each with an illusion of dedicated, contiguous despite physical constraints. However, it introduces internal fragmentation within pages (unused portions at the end of the last page) and overhead from maintenance, which can require multiple memory accesses per address translation—often mitigated by hardware caches called Translation Lookaside Buffers (TLBs). Unlike segmentation, which uses variable-sized logical units, paging employs uniform fixed sizes for simplicity and to avoid fragmentation issues, though modern paging implementations like those in use multi-level for handling large address spaces (e.g., 64-bit architectures). Overall, paging enables scalable systems essential for multitasking environments, with performance relying on locality principles to minimize page faults and thrashing.

Fundamentals

Definition and Purpose

Memory paging is a technique in which the virtual address space of a is divided into fixed-size blocks called pages, while physical memory is similarly partitioned into corresponding fixed-size units known as frames. This allows the operating system to allocate non-contiguous physical memory to a by mapping virtual pages to available physical frames as needed. The primary purpose of memory paging is to implement , enabling each process to operate as if it has access to a contiguous block of larger than the available physical RAM. By abstracting physical limitations, paging supports —preventing one process from accessing another's —and promotes efficient resource utilization through techniques like demand loading, where pages are fetched into physical only when accessed, potentially triggering a if not present. This abstraction simplifies for programmers, who can develop applications without concern for physical constraints. Memory paging emerged in the as a response to the limitations of early computers, where program sizes were growing beyond the capacity of physical core memory, requiring manual overlay management that was time-consuming and error-prone. Pioneered in the Atlas computer at the , paging automated the transfer of data between fast core memory and slower drum storage, creating the illusion of a unified, large addressable space up to 10^6 words. This innovation improved programmer productivity by a factor of three and facilitated multiprogramming for multiple users. Unlike segmentation, which divides memory into variable-sized logical units and suffers from external fragmentation due to scattered free blocks, paging employs uniform fixed-size pages to eliminate external fragmentation entirely, though it may introduce minor internal fragmentation within partially used pages. This fixed-size approach simplifies allocation and deallocation, making memory management more predictable and efficient compared to contiguous allocation schemes that required large unbroken blocks.

Page and Frame Concepts

In memory paging, a page represents a fixed-size block of belonging to a , serving as the fundamental unit for allocation and . Pages enable the abstraction of , allowing non-contiguous mapping to physical memory locations. Typically, the page size is 4 KB in modern systems, such as those using the x86 architecture, though this can vary based on hardware and operating system configurations. A , in contrast, is the corresponding fixed-size unit within physical memory (RAM) designed to hold a single page. Frames are pre-allocated slots in the physical address space, and the operating system manages a pool of free frames to load virtual pages as needed. The size of a frame matches the page size exactly, ensuring compatibility during page transfers from secondary storage to main memory. The selection of page size involves key trade-offs that impact system efficiency. Smaller page sizes, such as 512 bytes, minimize internal fragmentation by reducing wasted space within partially used pages, but they increase the overhead of page tables due to a greater number of entries required for the same address space. Larger sizes, up to 64 KB, reduce page table overhead and TLB (translation lookaside buffer) pressure but can lead to higher internal fragmentation, especially for small memory allocations. Common page sizes range from 512 bytes to 64 KB across various architectures, with 4 KB being the standard for x86-based systems to balance these factors. Both pages and frames must adhere to strict alignment requirements to prevent overlap and ensure efficient hardware access. Pages in virtual space and frames in physical space are aligned to multiples of their size (e.g., a 4 KB page starts at an address divisible by 4096), which simplifies address calculations and avoids boundary crossing issues during memory operations. For larger data structures exceeding a single page, processes request multi-page allocations, where the operating system assigns contiguous or non-contiguous frames as available from its free frame list. This mechanism supports scalable memory usage for applications with varying needs, such as arrays or heaps that span multiple pages, while maintaining the paging system's flexibility.

Address Translation

Page Tables

Page tables serve as the core data structures in memory paging systems, enabling the operating system to map virtual page numbers (VPNs) from a process's to physical frame numbers (PFNs) in main . Each maintains its own , typically implemented as an array where the index corresponds to the VPN, and the entry holds the PFN along with metadata for and status. This per-process isolation ensures that virtual addresses are translated independently, preventing interference between concurrently running programs. A page table entry (PTE) generally consists of several fields to support efficient and secure memory access. The primary field is the PFN, which specifies the starting of the corresponding physical frame; for instance, in a with 1 MB of RAM and 4 KB pages, the PFN requires 8 bits to up to 256 . Additional bits include a valid/invalid flag to indicate whether the virtual page is currently mapped to a physical frame, protection bits to enforce permissions such as read, write, or execute (e.g., restricting kernel pages to supervisor mode only), and reference/modified bits to track usage patterns—the reference bit is set on any access for page replacement heuristics, while the modified bit (or ) flags writes to determine if swapping back to disk is needed. These bits collectively enable the (MMU) to validate translations and trigger faults for protection violations. Page tables come in several organizational types to balance simplicity, memory efficiency, and performance across different sizes. The simplest form is the flat or single-level page table, an array with one entry per possible VPN in the . While straightforward, this incurs high overhead for large spaces; for a 32-bit virtual with 4 KB pages, it requires 2^{20} (1 million) entries, consuming 4 MB if each PTE is 4 bytes, which becomes prohibitive with many processes. To mitigate this, hierarchical or multi-level s organize mappings into a tree-like structure, allocating only sub-tables for used portions of the . In the two-level used in 32-bit x86 architectures, a page directory (first level) indexes into secondary page tables (second level), each covering a subset of the VPN space— for example, the directory might use 10 bits to select one of 1,024 page tables, each handling 4 MB of with 1,024 entries. This approach drastically reduces allocated memory for sparse usage, as unused directory entries point to nothing. Modern extensions originally used a four-level hierarchy in x86-64 and to scale to 48-bit virtual addresses (e.g., PML4, PDP, PD, and PT levels, each using 9 bits for indexing), with support for huge pages for contiguous mappings; more recently, five-level paging (adding a PML5 level) extends this to 57-bit virtual addresses on supported hardware. For systems with extremely sparse or shared address spaces, inverted page tables reverse the mapping by maintaining a single system-wide table with one entry per physical frame, storing the VPN (and process ID) that occupies it. Lookups require hashing the VPN (often combined with process ID) to find the matching frame, with collisions resolved via in a hash anchor table, as implemented in architectures like PowerPC. This eliminates per-process overhead, using fixed memory proportional to physical RAM—e.g., less than 0.1% for 4 KB pages with 4-byte entries—making it ideal for environments with vast virtual spaces but limited active pages. The memory overhead of page tables remains a key concern, as even efficient designs can consume significant resources; a single 4-byte PTE per 4 KB page equates to 0.1% overhead, but full allocation for unused exacerbates this. Solutions like sparse allocation in multi-level tables address this by lazily creating sub-tables only when pages are accessed, while inverted tables inherently avoid waste by tying size to physical memory. In practice, operating systems like employ these techniques alongside huge pages (e.g., 2 MB) to further minimize entry counts and overhead.

Translation Process

In memory paging, the address translation process converts a virtual address issued by a program into the corresponding in main memory using the associated with the process. The virtual address is partitioned into two primary components: the virtual page number (VPN), which identifies the page within the virtual , and the offset, which specifies the byte location within that page. For a typical 32-bit virtual using 4 KB pages, the higher 20 bits form the VPN, while the lower 12 bits constitute the offset, as 4 KB equals 2122^{12} bytes. The translation begins by extracting the VPN from the virtual address through masking and shifting operations. This VPN serves as an index to locate the entry in the , which contains the physical frame number (PFN) if the page is mapped. The PFN, representing the starting address of the physical frame, is then shifted left by the page shift amount and combined with the unchanged offset to yield the . This concatenation ensures that the intra-page displacement remains consistent between virtual and physical spaces. The process can be formalized as: Physical Address=(PFNpage_shift)+offset\text{Physical Address} = (\text{PFN} \ll \text{page\_shift}) + \text{offset} where page_shift=log2(page_size)\text{page\_shift} = \log_2(\text{page\_size}), equaling 12 for 4 KB pages. In systems with large address spaces, single-level page tables become inefficient due to their size, leading to the use of multi-level page tables for hierarchical translation. For a two-level scheme in a 32-bit system with 4 KB pages, the 20-bit VPN is divided into a 10-bit outer index and a 10-bit inner index. Translation proceeds by first using the outer index to access the page directory—a top-level table that points to secondary page tables. The inner index then indexes the appropriate secondary table to retrieve the PFN, which is concatenated with the offset as in the single-level case. This structure reduces memory overhead by allocating inner tables only as needed. During translation, if the page table entry for the VPN indicates an invalid mapping—such as an invalid bit set or a missing PFN—the hardware detects this condition and raises a exception, transferring control to the operating system for resolution.

Page Faults

Detection and Types

Page faults are detected by the hardware (MMU) during the virtual-to-physical address translation process. When the MMU attempts to access a entry and finds it invalid—such as a missing mapping or an unmapped virtual page—it generates a hardware trap, typically in the form of an . On x86 architectures, this corresponds to interrupt vector 14, which signals the processor to halt the current instruction and transfer control to the operating system's kernel via the . This trap invokes a dedicated page fault handler in the OS kernel, which receives details about the fault, including the faulting virtual address (stored in the CR2 register on x86) and an indicating the reason for the failure. The kernel handler then categorizes the fault based on the page table entry's state and the nature of the access. Page faults are classified into several types depending on the underlying cause. A hard fault, also known as a major fault, occurs when the requested page is not present in physical memory and must be retrieved from secondary storage, such as disk or swap space, involving I/O operations and process suspension. In contrast, a soft fault or minor fault happens when the page is already resident in physical memory but requires remapping in the process's , such as during scenarios or when sharing pages between processes without disk access. A protection fault arises from an access violation on a valid page, for instance, attempting to write to a read-only page or execute non-executable memory, as enforced by permission bits in the page table entry. The distinction between instruction and data faults is determined by the type of operation causing the fault: an instruction fault occurs during the fetch of executable code, while a data fault stems from load or store operations on operands. The kernel handler identifies this by examining the saved program counter and the faulting access type, allowing tailored responses such as restarting the instruction after resolution. In Unix-like systems, page fault statistics serve as key performance metrics, with counters tracking minor faults (min_flt) for quick resolutions and major faults (maj_flt) for those requiring I/O, enabling monitoring of memory efficiency and potential thrashing. These metrics, accessible via tools like ps or /proc, help quantify the overhead of virtual memory management.

Handling Mechanism

When a page fault occurs, the operating system (OS) invokes a dedicated page-fault handler to resolve it through a structured workflow. First, the current process context is saved, including the program counter and registers, to allow resumption after resolution. The OS then checks the validity of the faulting address by examining the page table entry (PTE); if the address is illegal or the page is unprotected, the process is typically terminated with a segmentation fault. If valid, and if physical memory is full, the OS selects a victim frame using a page replacement algorithm, such as least recently used (LRU). The required page is then loaded from disk (or backing store) into the selected frame via a scheduled I/O operation, during which the process is blocked. Upon completion, the PTE is updated to mark the page as present, including the physical frame number, reference and modification bits, and protection attributes. Finally, the process is rescheduled, and the faulting instruction is restarted, often triggering a TLB update if necessary. Page fault handling prioritizes based on context: faults in user mode are serviced in the kernel's , allowing preemption of other user processes, while kernel-mode faults (e.g., during system calls) occur in non-preemptible sections and must be resolved immediately to avoid deadlocks or panics, often pinning critical kernel pages in . In systems employing demand paging as the common loading strategy, this mechanism ensures pages are fetched only when accessed. For shared pages in mechanisms like (COW), used in process forking to optimize , the initial fault on a write access to a shared read-only page triggers the OS to allocate a new private frame, copy the content, update the PTE to point to the copy, and mark it writable, allowing independent modification without immediate duplication on fork. Page faults are classified as minor or major based on resolution cost: minor faults, such as those resolved via in-memory operations (e.g., COW copy or zero-filling), incur low overhead in the range of microseconds to milliseconds, while major faults requiring disk I/O (e.g., fetching from swap) can take 8 milliseconds or more due to seek and transfer latencies, significantly impacting performance. The handling logic can be outlined in pseudocode as follows, illustrating the core decisions:

handle_page_fault(faulting_address): save_process_context() pte = lookup_page_table(faulting_address) if pte.invalid or address_illegal(faulting_address): terminate_process("[Segmentation fault](/page/Segmentation_fault)") else: if no_free_frame(): victim_frame = select_victim(page_replacement_policy) if victim.dirty: write_to_disk(victim) frame = allocate_frame() read_from_disk(pte.disk_address, frame) // Block process during I/O update_pte(pte, frame, present=True, protection=original) resume_process() restart_instruction()

handle_page_fault(faulting_address): save_process_context() pte = lookup_page_table(faulting_address) if pte.invalid or address_illegal(faulting_address): terminate_process("[Segmentation fault](/page/Segmentation_fault)") else: if no_free_frame(): victim_frame = select_victim(page_replacement_policy) if victim.dirty: write_to_disk(victim) frame = allocate_frame() read_from_disk(pte.disk_address, frame) // Block process during I/O update_pte(pte, frame, present=True, protection=original) resume_process() restart_instruction()

This pseudocode captures the essential steps, with variations across OS implementations.

Page Fetching

Demand Paging

Demand paging implements a lazy loading strategy in virtual memory systems, where pages are brought into main memory only upon their first reference by a process, initiated by a page fault. Initially, all page table entries for a process are marked as invalid to indicate that the corresponding pages reside on the backing store rather than in physical memory. Upon detecting a page fault during address translation, the operating system allocates a free physical frame, copies the required page from the backing store to that frame, updates the page table entry to reflect its valid status and physical location, and resumes process execution. This approach relies on the principle of locality of reference, ensuring that frequently accessed pages (the working set) are prioritized for residency in memory. The benefits of demand paging stem from its ability to defer loading until necessary, thereby reducing initial program load times and minimizing unnecessary operations for pages that may never be used. By exploiting spatial and temporal locality, it allows processes to operate with a much larger than physical , improving overall system efficiency in multiprogramming environments without requiring the entire image to be resident at startup. This technique supports systems by incrementally building the in , avoiding the overhead of preloading irrelevant code or data segments. In implementation, the backing store consists of a dedicated swap partition on disk or a regular file, serving as the persistent repository for non-resident pages. When a occurs, the kernel synchronously fetches the page from this store into a physical frame, while a background page daemon process—such as the pageout daemon in BSD variants—handles asynchronous tasks like scanning for low-memory conditions, cleaning dirty pages by writing them to the backing store, and maintaining a pool of free frames using algorithms like the clock replacement policy. This daemon awakens periodically (e.g., every 250 milliseconds) or when free falls below a threshold, ensuring proactive management without blocking foreground processes. An early example of demand paging in Unix systems appeared in 3BSD, released in late 1979, which introduced support including for on-demand loading from swap space on VAX hardware. Despite these advantages, paging introduces significant drawbacks due to the overhead of page faults, which involve switching, disk seeks, and data transfer—typically 100 to 1000 times slower than direct RAM access, depending on storage and system load. This latency can degrade performance, particularly in scenarios with poor locality or high fault rates, leading to increased CPU utilization for fault handling and potential bottlenecks in I/O-bound environments.

Prepaging Techniques

Prepaging techniques involve proactively loading pages into physical before they are explicitly referenced, aiming to exploit spatial and temporal to minimize page faults and improve overall system performance. Unlike paging, which reacts to faults by loading only the requested page, prepaging anticipates future accesses based on patterns observed in program behavior or historical data. This approach can significantly reduce the overhead of repeated disk I/O operations, particularly in environments with high disk bandwidth but substantial latency. However, it risks wasting resources by loading unused pages, potentially increasing memory pressure and I/O load on the system. Anticipatory or prepaged paging predicts and prefetches pages likely to be needed soon, such as adjacent pages in a program's , to leverage the principle of locality. For instance, upon detecting patterns, the system may load a cluster of nearby pages in advance, reducing future faults by amortizing the cost of I/O across multiple pages. This technique has been analyzed to show potential reductions in page traffic in phased program executions, though effectiveness depends on accurate . Early implementations demonstrated that prepaging adjacent pages could cut initial fault rates in multiprogrammed systems, but over-prediction led to unnecessary evictions. The model provides a foundational framework for prepaging by estimating the set of pages actively used by a over a defined reference window, typically measured in memory references. Upon startup or , the system loads the entire estimated to ensure the process has immediate access to its core pages, avoiding cold-start faults. Peter Denning's seminal work formalized this model, proving that maintaining the in memory prevents thrashing and bounds the number of faults to the rate of changes. In practice, this has been adopted in systems like VAX/VMS, where prepaging the reduced startup latency by preallocating frames based on prior execution traces. Trade-offs include the computational cost of estimating the size, which can be mitigated by adaptive window tuning, but inaccurate estimates may load extraneous pages. Page clustering groups related pages—such as those from the same , , or file block—into batches for simultaneous loading, optimizing I/O efficiency through larger, contiguous transfers. Off-line clustering algorithms analyze program traces to identify frequently co-accessed pages and store them adjacently on disk, enabling prepaging of entire clusters on fault. Research has shown that such clustering can minimize frequency while constraining memory occupancy, with bounds proving optimality for linear combinations of fault rates and residency. In modern operating systems like , prepaging integrates with I/O schedulers such as the Completely Fair Queuing (CFQ) through readahead mechanisms, which prefetch file-backed pages based on sequential read patterns detected during execution. The readahead window dynamically adjusts—starting small and expanding on confirmed —to balance prefetch accuracy and resource use, often loading up to 128 KB ahead. Mel Gorman's analysis highlights how this reduces faults in file-intensive applications by exploiting disk bandwidth, with empirical results showing 2-3x speedup in sequential scans, though patterns can lead to wasted I/O if the window overexpands. Overall, these techniques underscore prepaging's value in reducing latency at the cost of potential inefficiency, with system designers tuning parameters to match workload characteristics.

Page Replacement

Core Algorithms

The core algorithms for page replacement determine which resident page to evict as a victim when a occurs and all memory frames are occupied, with the goal of minimizing future faults by leveraging observed access patterns. These policies form the foundation of management, balancing simplicity, hardware support, and performance under the assumption of . Seminal work in this area, beginning in the mid-1960s, established benchmarks for evaluating replacement strategies through trace-driven simulations on reference strings representing accesses. The First-In-First-Out (FIFO) algorithm selects the oldest page in memory for eviction, maintaining a simple queue of loaded pages without considering access recency or frequency. This approach is straightforward to implement in software and requires minimal hardware assistance, making it suitable for early paging systems. However, FIFO ignores temporal locality, often leading to poor performance on workloads with looping patterns, and it exhibits Belady's anomaly, where increasing the number of available frames paradoxically increases the rate for certain reference strings. For example, on the reference string 1,2,3,4,1,2,5,1,2,3,4,5 with 3 frames, FIFO incurs 9 faults, but with 4 frames, it incurs 10 faults. The Optimal (OPT) algorithm, also known as MIN or Belady's algorithm, evicts the page whose next reference occurs farthest in the future, achieving the theoretical minimum number of page faults for a given reference string. Introduced as an ideal for virtual storage systems, OPT serves primarily as a lower bound for comparison, as it demands complete foresight into future accesses, which is unavailable in practice. To simulate OPT faults, Belady's method processes the reference string forward, at each fault selecting the victim by scanning ahead to find the maximum forward distance to the next use; this deterministic simulation enables precise evaluation of other algorithms against the optimum. For the example string above, OPT incurs only 6 faults with 4 frames, highlighting the gap with suboptimal policies. The Least Recently Used (LRU) algorithm approximates OPT by evicting the page unused for the longest time, relying on the principle that recently accessed pages are likely to be reused soon due to temporal locality. LRU maintains an ordered list or stack of pages, moving accessed pages to the most-recent end; full implementation tracks timestamps or uses hardware counters for each access. Its effectiveness stems from the stack distance model, where the distance to a page's previous correlates with future reuse probability, allowing LRU to mimic OPT under independent assumptions without knowledge. On the prior example, LRU incurs 8 faults with 4 frames, outperforming FIFO but trailing OPT. While LRU avoids Belady's anomaly, its overhead from list maintenance limits scalability in large memory systems. The Clock algorithm, or Second Chance, provides a low-overhead hardware-assisted of LRU by organizing pages in a circular queue and using a single "hand" pointer to scan for victims. Each page has a reference bit set to 1 by the (MMU) on access; when seeking a victim, the algorithm advances the hand, evicting if the bit is 0, or clearing it to 1 and advancing if 1, effectively giving referenced pages a second chance before removal. This method leverages MMU hardware efficiently, avoiding per-access list updates, and degenerates to FIFO when all bits remain set. It performs comparably to LRU on many traces while incurring lower implementation cost, making it a practical choice for resource-constrained systems. These algorithms are typically evaluated via trace-driven simulation on representative workloads, measuring fault rates against OPT as a baseline, and through the effective access time (EAT), which quantifies average latency per memory reference: EAT=(1p)×tmem+p×tfault\text{EAT} = (1 - p) \times t_{\text{mem}} + p \times t_{\text{fault}} where pp is the page fault probability, tmemt_{\text{mem}} is the main memory access time, and tfaultt_{\text{fault}} is the service time for a fault (including disk I/O and OS overhead). For instance, with tmem=100t_{\text{mem}} = 100 ns, tfault=10t_{\text{fault}} = 10 ms, and p=0.001p = 0.001 under LRU, EAT approximates 10.1 μs, illustrating how even low fault rates amplify effective latency. Such metrics underscore FIFO's simplicity at the cost of higher pp, OPT's unattainable ideal, LRU's strong approximation, and Clock's balanced efficiency.

Optimization Methods

Optimization methods in memory paging enhance the efficiency of page replacement policies by improving queue management, reducing latency during faults, and minimizing unnecessary evictions. These techniques build on foundational algorithms like the clock or LRU approximations to handle real-world workloads more effectively, particularly in systems like where memory pressure triggers proactive reclamation. By maintaining free page queues and employing background processes, operating systems can allocate frames quickly and avoid thrashing under load. The free page queue maintains a list of empty physical frames available for immediate allocation during page faults, ensuring low-latency mapping without invoking full replacement scans. When the queue depletes below a threshold—such as the minimum in zones—scavenging mechanisms activate to replenish it by reclaiming pages from inactive lists. This approach, implemented in 's zone-based , prioritizes quick access to pre-freed frames, reducing allocation overhead in multi-threaded environments. Page stealing involves kernel threads, such as kswapd in , proactively "stealing" pages from user processes or segments to refill the free page queue under memory pressure. This occurs when a zone's free pages fall below low watermarks, targeting pages that alleviate node-local shortages without immediately swapping them out. By isolating the stealing process from user contexts, it prevents direct interference with foreground tasks while maintaining system-wide balance, particularly in NUMA systems where inter-node transfers are costly. Reclamation optimizes eviction by cleaning dirty pages—those modified but not yet written back—directly to the free list without fully removing them from address spaces. In Linux, this process uses the writepage() function to flush dirty data to backing storage, clearing the PG_dirty flag and marking the page as laundered, allowing reuse as a clean frame. This avoids the overhead of complete page-out operations during faults, especially for file-backed pages, and integrates with LRU scanning to batch cleanings efficiently. Pre-cleaning employs background writeback of modified pages to proactively reduce page fault latency by keeping more pages in a clean state ready for reuse. Linux's flusher threads, controlled by parameters like dirty_background_ratio, periodically scan and write out dirty pages before allocation pressure builds, preventing synchronous I/O blocks during faults. This technique, akin to eager writeback strategies, balances memory traffic by distributing write operations across idle periods, improving overall throughput in I/O-intensive workloads. The clock algorithm refines the basic clock replacement by incorporating working set estimates to better approximate a process's locality, evicting pages outside the recent reference window while scanning a circular list. Proposed as a synthesis of policy and clock simplicity, it uses a reference bit and aging to isolate tasks and replace locally, reducing faults in multiprogrammed systems compared to pure FIFO or clock methods. This variant promotes efficiency by avoiding global interference, making it suitable for environments with variable locality. Age-based policies, such as 's approximation of LRU via aging counters, enhance replacement by decrementing an "age" value on unreferenced pages and incrementing it on accessed ones during scans, effectively prioritizing recently used pages without exact timestamp tracking. In the CLOCK-Pro extension, this counter mechanism simulates LRU behavior in the clock , decreasing age for inactive pages and increasing it for referenced ones, which adapts to workload phases and reduces unnecessary evictions. Implemented in kernels since version 2.6, it provides a low-overhead way to handle diverse access patterns, improving hit rates in practice.

System Issues

Thrashing

Thrashing occurs when a paged system experiences excessive paging activity, leading to severe performance degradation as the system spends more time handling page faults and swapping pages between main and secondary storage than executing useful computations. This phenomenon arises primarily when the total memory demand from active processes exceeds the available physical memory frames, causing frequent page replacements that interfere with each other under global paging algorithms such as FIFO or LRU. The root causes include a large traverse time between main and auxiliary —often on the order of virtual time units—and the allocation of insufficient frames to processes, allowing one process's memory needs to displace another's critical pages. In such scenarios, the system's efficiency drops sharply; for instance, as the missing-page probability increases, the effective service rate can collapse, with the effective access time dominated by I/O operations rather than CPU processing. Symptoms manifest as CPU underutilization, where the processor idles while waiting for disk I/O, and rapid swapping of pages in and out, resulting in a vicious cycle of faults that slows overall throughput to near zero. Detection of thrashing typically involves monitoring key metrics such as the page fault rate and comparing the aggregate working set size of active processes against the total physical memory allocation. If the sum of working sets exceeds available frames, or if fault rates climb significantly—for example, in certain implementations like early kernels exceeding 10 faults per second—the system is likely thrashing; tools like CPU utilization thresholds below 40% alongside high fault activity can confirm this state in those contexts. Prevention strategies focus on load control to maintain balance, such as suspending or swapping out low-priority processes to reduce multiprogramming levels until memory demand fits within physical limits, or employing page replacement policies per process to isolate interference. The model, which ensures each process receives enough frames for its recently referenced pages, effectively resists thrashing by dynamically adjusting allocations based on a window. Page replacement algorithms contribute to inefficiency if global, but variants mitigate this by limiting competition. Historically, thrashing was first prominently observed in the system during the 1960s, where excessive paging demands on limited core memory led to frequent page removals using a least-recently-used algorithm, degrading efficiency until load control mechanisms like segment deactivation were implemented to free resources.

Page Sharing

In memory paging systems, page sharing allows multiple processes to access the same physical memory frame through distinct virtual addresses, achieved by mapping multiple page table entries (PTEs) to a single page frame number (PFN). This mechanism promotes efficiency by avoiding redundant allocation of physical memory for identical content, such as or segments. Read-only sharing applies primarily to immutable segments like executable code, where processes map the same physical pages without modification, eliminating the need for duplication. For instance, shared libraries in dynamically linked executables, such as those in the (ELF), enable a single read-only copy of the library to be mapped across multiple processes. This approach can yield substantial savings; on systems with numerous processes, such as busy web servers, it prevents gigabytes of redundant allocation by preserving one shared instance per library. Copy-on-write (COW) extends sharing to writable regions, particularly during process creation via the fork() system call, where the parent and child initially share pages marked as read-only. A write access triggers a page fault, prompting the operating system to allocate a new physical frame, copy the original page content, and update the child's PTE to point to the private copy while retaining the parent's reference to the shared frame. This deferral optimizes fork() performance, as only modified pages incur copying overhead; in practice, applications like editors or interpreters often update less than half their data segments, achieving over 50% reduction in copying time compared to full duplication. The benefits of page sharing include minimized physical memory duplication and enhanced (IPC) through mechanisms like segments, where processes map the same pages into their address spaces for direct data exchange without kernel mediation. For example, shared memory APIs leverage paging to allocate and fault in shared pages on first access, supporting efficient producer-consumer patterns. To manage deallocation safely, operating systems employ on physical frames, incrementing the count for each PTE mapping and decrementing upon unmapping; a frame is freed only when its count reaches zero. This ensures shared pages persist until all references are released, though concurrent access requires primitives to mitigate race conditions, such as time-of-check-to-time-of-use (TOCTOU) vulnerabilities where mappings change between validation and use.

Implementations

Early Systems

The Atlas computer, delivered in 1962, was the first system to implement paged , known as the "one-level store," which integrated core memory and magnetic drum storage into a unified of up to 1 million 48-bit words. Pages were fixed at 512 words (48 bits each), allowing demand paging where absent pages triggered automatic loading from the drum backing store without explicit programmer intervention. An associative memory unit served as a precursor to the (TLB), holding mappings for up to 32 recently accessed pages to accelerate address translation and reduce fault overhead. The Model 67, introduced in 1967 as part of the System/360 family announced in 1964, brought paging to a commercial mainframe architecture, supporting through dynamic address translation. It employed a two-level hierarchical structure, with a segment table pointing to page tables, each page fixed at 4,096 bytes (1K words of 32 bits or 2K halfwords), enabling a of up to 16 million bytes per user. Page faults were handled as program interruptions via a supervisor call, invoking the operating system to fetch pages from drum or disk backing store, which supported early paging implementations like those in the TSS/360 time-sharing system. Multics, operational from 1969, combined segmentation with paging to create a hierarchical system that influenced subsequent designs, including Unix. Segments were of variable length to match logical program units, while underlying pages were fixed at 1,024 36-bit words, allowing fine-grained and across multiple users in a environment. The system used a multi-level directory structure for segment descriptors, with each segment's managing mappings to physical frames, and page faults triggering indirect pointer resolution for loading from disk. Early paging systems evolved from magnetic drum backing stores, as in the Atlas, to more reliable disk-based secondary storage by the late 1960s, improving fault recovery and capacity while introducing innovations like "absent" page indicators in to mark unloaded pages without wasting table space. These advancements enabled efficient on multiprogrammed systems, reducing the need for physical memory dedication per user and paving the way for interactive computing; the Atlas design, in particular, influenced subsequent Manchester University projects and broader adoption in research environments.

Windows Variants

In Windows 3.x and 9x series operating systems, each was provided with a flat 4 GB , though limited by the 16-bit and early 32-bit architecture constraints, enabling demand paging using 4 KB pages to manage memory efficiently on systems with limited RAM. The page frame number (PFN) database served as a core structure for tracking the state of physical memory frames, including allocation, residency, and modification status, facilitating frame management during paging operations. Beginning with in 1993, the memory management subsystem introduced support for (SMP), allowing multiple processors to access shared physical memory through coordinated paging mechanisms that ensured cache coherency and avoided race conditions in updates. Lock-free page allocation techniques were implemented to enable concurrent access without traditional locks, improving scalability in multiprocessor environments by using atomic operations for page list manipulations. Section objects provided a unified for file-backed pages, allowing memory-mapped files to be paged in on demand from disk, with the memory manager handling transitions between physical RAM and the paging file seamlessly. In modern Windows versions post-XP, such as and 11, the SuperFetch service (renamed SysMain) implements prepaging by analyzing usage patterns to proactively load frequently accessed application data and pages into standby memory lists, reducing startup times and improving responsiveness on SSDs and HDDs. Variable page sizes are supported, including large pages of up to 2 MB, which minimize (TLB) misses in high-memory workloads like databases and virtual machines, though allocation requires administrative privileges and sufficient contiguous physical memory. To address fragmentation, Windows employs demand-zero pages for initializing newly committed virtual memory regions, where the first access triggers a page fault that fills the page with zeros from a dedicated zero page pool, ensuring secure and efficient startup without explicit user code for clearing. Memory compaction is performed periodically by the memory manager to defragment physical RAM, relocating pages within the standby and modified lists to consolidate free blocks, particularly to enable allocation of large pages and mitigate external fragmentation in low-memory scenarios. The core kernel structures for paging are managed by the Mm (memory manager) subsystem, which maintains working sets for each process—the collection of physical pages currently resident in RAM and actively used—to optimize locality and minimize faults. The balance set manager, a component of the Mm subsystem, oversees page replacement by trimming working sets of less active processes using a modified least-recently-used (LRU) , ensuring fair allocation of physical frames across competing workloads.

Unix-like Systems

In early Unix systems during the 1970s, memory management relied primarily on swapping entire processes to disk rather than finer-grained paging. Unix Version 6, released in 1975, introduced demand loading for executable files but operated on a swap-based model without full support. Full , including demand paging and page replacement, was not implemented until the 3BSD release in 1979, which added these features to support the VAX architecture and enable more efficient use of limited physical memory. The , a prominent system, standardizes on 4 KB pages for most architectures to balance TLB efficiency and fragmentation. Virtual s are tracked via vm_area_struct structures, which describe contiguous regions of a process's , including permissions and backing stores, facilitating operations like mapping and unmapping. Memory reclamation is handled by the kswapd kernel thread, which proactively scans and evicts pages from the least recently used (LRU) lists when free memory falls below configurable thresholds, preventing immediate allocation failures. For applications requiring large contiguous memory, supports huge pages (typically 2 MB or larger) through the hugetlbfs pseudo-filesystem, which allocates reserved physical pages to minimize TLB misses and overhead in workloads. macOS, built on the Darwin kernel that integrates the Mach microkernel with BSD components, inherits a virtual memory design rooted in Mach's port-based object model for managing address spaces. This heritage enables flexible memory objects that can be shared or copied-on-write across processes. Compressed memory was introduced in OS X 10.9 () to compress inactive pages in RAM using algorithms like LZ4, reducing swap usage by up to 50% in typical workloads while maintaining low latency for decompression. macOS also employs a unified buffer cache, merging file I/O buffers with the VM subsystem to avoid duplication and improve caching coherence for file-backed data. Across systems, pages are categorized as anonymous (backed by swap space for private data like stacks and heaps) or file-backed (mapped directly to filesystem objects for executables and data files), allowing the kernel to optimize policies—file-backed pages can be discarded and reloaded from disk, while anonymous pages require swapping. The madvise() enables applications to advise the kernel on access patterns, such as random or sequential usage, influencing prepaging decisions like Linux's read-ahead for files. In specifically, the Out-of-Memory (OOM) killer activates during extreme pressure, scoring processes based on factors like memory usage and niceness to select and terminate the least valuable one, averting system-wide thrashing. Key differences in implementation include Linux's use of global LRU lists for system-wide page scanning and replacement, promoting balanced reclamation across all processes, contrasted with macOS's zone-based allocators for kernel objects and per-region management in Mach VM, which tailors reclamation to process-specific needs and hardware constraints like unified memory on .

Performance Factors

Swap Space Management

Swap space serves as a dedicated area on disk that acts as backing storage for pages evicted from physical during paging operations, allowing the operating system to manage larger than available RAM. This mechanism supports the illusion of abundant by temporarily storing inactive pages on disk, which can be retrieved via page faults when needed. Overcommitment in swap space enables the allocation of exceeding physical RAM limits, facilitating efficient multiprogramming where multiple processes share the system without immediate resource exhaustion. However, this introduces risks such as out-of- (OOM) conditions when both RAM and swap are depleted, potentially triggering the OOM killer in to terminate processes based on heuristics like usage and priority. Strategies to mitigate overcommit risks include configuring the vm.overcommit_ (e.g., setting it to 2 for strict allocation checks) and monitoring via tools like early allocation probes during requests. Sizing swap space follows guidelines tailored to system needs, such as 1-2 times the RAM size to support , where the entire memory contents must be saved to disk. In modern operating systems like , dynamic management is achieved through tunables like swappiness (ranging from 0 to 200), which controls the aggressiveness of swapping relative to file caching; lower values (e.g., 10) prioritize RAM usage, while higher ones (e.g., 100) favor early swapping to prevent OOM. Swap space can be implemented as a dedicated partition or a file on the filesystem, with partitions offering superior by bypassing filesystem overhead for direct disk writes, thus reducing latency in page I/O operations. Alternatively, provides compressed in-RAM swap as a kernel module, creating a block device that compresses pages on-the-fly to extend effective memory without disk access, ideal for memory-constrained systems. Performance bottlenecks arise from swap I/O, which is orders of magnitude slower than RAM access (e.g., 10,000–100,000 times for traditional disks), leading to thrashing if overused. Using SSDs for swap mitigates this by reducing latency compared to HDDs, but frequent random writes accelerate NAND flash wear; modern SSD controllers employ wear-leveling algorithms to distribute erasures evenly across cells, extending device lifespan.

Address Space Configurations

In the most common configuration for modern computing systems, physical memory (RAM) is smaller than the provided to processes, making paging essential for effective . This setup allows multiple processes to share limited physical resources by mapping portions of their larger virtual spaces into RAM on demand, with unused pages swapped to disk when necessary. High page fault rates can occur under memory pressure, leading to increased swap usage and potential performance degradation if the working set exceeds available RAM. When physical memory exactly matches the virtual address space in size, paging mechanisms are typically unnecessary for address translation, as all virtual pages can reside directly in RAM without swapping. However, paging structures may still be used to support . A rarer scenario arises when physical memory exceeds the virtual address space, such as in 64-bit systems running small applications where RAM capacity surpasses the addressable virtual limit. In these cases, all virtual pages can remain resident in physical memory simultaneously, eliminating the need for swapping and restricting page faults primarily to access protection violations rather than capacity misses. This configuration is exemplified in environments with ample RAM but constrained virtual addressing, like legacy 32-bit applications on modern hardware. Addressability limits play a critical role in these configurations; for example, 32-bit architectures cap the virtual address space at 4 GB, regardless of larger physical memory availability. To address this, (PAE) enables 32-bit systems to utilize up to 64 GB of physical RAM by expanding physical addressing to 36 bits while keeping virtual space at 32 bits, thus supporting paging into extended physical memory without altering virtual limits. Paging operates within broader address space configurations, including flat and segmented models. The flat model treats as a single contiguous space, simplifying paging by using linear virtual addresses directly mapped via s without segment boundaries, which is the standard in x86 . In contrast, segmented models divide the into variable-sized segments before paging, adding complexity but enabling finer-grained protection; however, modern systems favor flat models to minimize overhead. These choices impact sizes—for instance, the x86-64 architecture employs a 48-bit in its four-level paging scheme, resulting in s that cover 256 terabytes while fitting within practical constraints.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.