Hubbry Logo
Copy-on-writeCopy-on-writeMain
Open search
Copy-on-write
Community hub
Copy-on-write
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Copy-on-write
Copy-on-write
from Wikipedia

Copy-on-write (COW), also called implicit sharing[1] or shadowing,[2] is a resource-management technique[3] used in programming to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared between programs until one tries to modify it. If no changes are made, no private copy is created, saving resources.[3] A copy is only made when needed, ensuring each program has its own version when modifications occur. This technique is commonly applied to memory, files, and data structures.

In virtual memory management

[edit]

Copy-on-write finds its main use in operating systems, sharing the physical memory of computers running multiple processes, in the implementation of the fork() system call. Typically, the new process does not modify any memory and immediately executes a new process, replacing the address space entirely. It would waste processor time and memory to copy all of the old process's memory during the fork only to immediately discard the copy.[4]

Copy-on-write can be implemented efficiently using the page table by marking certain pages of memory as read-only and keeping a count of the number of references to the page. When data is written to these pages, the operating-system kernel intercepts the write attempt and allocates a new physical page, initialized with the copy-on-write data, although the allocation can be skipped if there is only one reference. The kernel then updates the page table with the new (writable) page, decrements the number of references, and performs the write. The new allocation ensures that a change in the memory of one process is not visible in another's.[citation needed]

The copy-on-write technique can be extended to support efficient memory allocation by keeping one page of physical memory filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space. The combined algorithm is similar to demand paging.[3]

Copy-on-write pages are also used in the Linux kernel's same-page merging feature.[5]

In software

[edit]

COW is also used in library, application, and system code.

Examples

[edit]

The string class provided by the C++ standard library was specifically designed to allow copy-on-write implementations in the initial C++98 standard,[6] but not in the newer C++11 standard:[7]

std::string x("Hello");

std::string y = x;  // x and y use the same buffer.

y += ", World!";    // Now y uses a different buffer; x still uses the same old buffer.

In the PHP programming language, all types except references are implemented as copy-on-write. For example, strings and arrays are passed by reference, but when modified, they are duplicated if they have non-zero reference counts. This allows them to act as value types without the performance problems of copying on assignment or making them immutable.[8]

In the Qt framework, many types are copy-on-write ("implicitly shared" in Qt's terms). Qt uses atomic compare-and-swap operations to increment or decrement the internal reference counter. Since the copies are cheap, Qt types can often be safely used by multiple threads without the need of locking mechanisms such as mutexes. The benefits of COW are thus valid in both single- and multithreaded systems.[9]

In Docker, a set of software for implementing operating-system level virtualization, docker images are built in a layered format, with lower layers being read-only and the upper layer available for editing. Creating a new image which shares the same base layers as another image does not copy the layers, but instead follows COW principles and allows the two images to share layers until one is edited.[10][11]

In computer storage

[edit]

COW is used as the underlying mechanism in file systems like ZFS, Btrfs,[12] ReFS, and Bcachefs, as well as in logical volume management and database servers such as Microsoft SQL Server.

In traditional file systems, modifying a file overwrites the original data blocks in place. In a copy-on-write (COW) file system, the original blocks remain unchanged. When part of a file is modified, only the affected blocks are written to new locations, and metadata is updated to point to them, preserving the original version until it’s no longer needed. This approach enables features like snapshots, which capture the state of a file at a specific time without consuming much additional space. Snapshots typically store only the modified data and are kept close to the original. However, they are considered a weak form of incremental backup and cannot replace a full backup.[13]

In order to create and start new containers quickly, container engines doing OS-level virtualization often perform copy-on-write in storage, either block-level copy-on-write (as described above) or file-level copy-on-write.

Some but not all filesystems support file-level copy-on-write as part of union mounting,[14] including OverlayFS, aufs, GlusterFS, and UnionFS.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Copy-on-write (COW), also known as implicit sharing or shadowing, is a resource-management technique employed in computer systems to optimize the duplication of data structures by allowing multiple entities to initially share the same underlying resources, with private copies created only upon modification attempts. This approach defers the costly operation of copying until necessary, thereby reducing memory usage, execution time, and I/O overhead in scenarios like process creation or file cloning. In operating systems, COW is prominently used to implement the fork() system call, where a child process shares its parent's virtual memory pages until a write occurs, at which point the affected pages are duplicated to maintain isolation. This mechanism, introduced in later BSD Unix implementations such as 4.3BSD, significantly improves efficiency for short-lived processes, such as those in shell pipelines, by minimizing initial copying and swap space demands. Modern kernels, including Linux, leverage COW for fork() to duplicate only page tables initially, incurring low overhead until modifications trigger physical copies. Beyond , COW extends to file systems for creating space-efficient snapshots and clones; for instance, in and , updates allocate new blocks while preserving original data for shared references, enabling features like versioning and backups without immediate full duplication. In programming languages and libraries, such as PHP's variable handling or pandas' DataFrames, COW facilitates mutable data sharing by treating derived objects as views until edits necessitate copies, enhancing performance in data-intensive applications. Overall, COW balances efficiency and safety across domains, though it can introduce complexity in handling concurrent accesses or fragmentation over time.

Fundamentals

Definition and Core Concept

Copy-on-write (COW) is a resource-management technique in which multiple users or processes initially share a single copy of data or memory, with the copy being duplicated only when a modification is attempted by one party, thereby preserving the original for others. This approach optimizes resource usage by avoiding unnecessary duplications during read operations. At its core, copy-on-write employs lazy copying to postpone the actual duplication of resources until necessary, typically detected via mechanisms like —which monitors the number of entities the resource—or page protection flags that induce a fault upon write access, enforcing read-only until divergence occurs. This ensures efficient shared read-only access among participants while granting exclusive write access to modifiers through writetime copying. The technique originated in the 1970s within early time-sharing operating systems, such as TENEX for the PDP-10, where it facilitated sharing of large address space portions for procedures and data, creating private copies solely for modified pages. It has since been generalized as a broader programming pattern applicable beyond initial system contexts.

Mechanism of Operation

Copy-on-write (CoW) operates by initially allowing multiple entities, such as processes or threads, to share the same underlying resource, such as a memory page or data block, without immediate duplication. This sharing is established by pointing all relevant references to the single shared instance, often tracked via reference counting to monitor the number of sharers. The mechanism proceeds in distinct steps. First, upon creation of a new that requires access to the , the configures shared access by mapping all participants to the original , avoiding any copy at this stage. Second, read operations are permitted directly on the shared without triggering any additional actions, as modifications are not involved. Third, when a write attempt occurs on the shared , the detects this via a protection mechanism and intervenes to create a private copy for the modifying . Finally, the write is applied only to this new copy, while the original remains unchanged for other sharers; reference counts are updated to reflect the split, decrementing the count on the original and initializing a new count for the copy. Technical enablers ensure enforcement of this lazy copying. Memory protection attributes, such as marking shared pages as read-only in page tables, trigger an exception or trap on write attempts, routing control to a handler that performs the copy. Alternatively, versioning metadata or flags in data structures can signal shared status and invoke copy logic without hardware traps. A generic algorithm for CoW can be illustrated in pseudocode as follows:

function access_resource(address, operation): if operation == READ: perform_read(address) // Direct access to shared resource else: // WRITE if is_shared(address): new_page = allocate_page() if new_page == NULL: handle_allocation_failure() // e.g., fail the operation else: copy_page(original_page(address), new_page) update_reference_count(original_page(address), decrement) update_reference_count(new_page, initialize=1) update_mapping(address, new_page) mark_writable(new_page) perform_write(new_page, data) else: perform_write(address, data)

function access_resource(address, operation): if operation == READ: perform_read(address) // Direct access to shared resource else: // WRITE if is_shared(address): new_page = allocate_page() if new_page == NULL: handle_allocation_failure() // e.g., fail the operation else: copy_page(original_page(address), new_page) update_reference_count(original_page(address), decrement) update_reference_count(new_page, initialize=1) update_mapping(address, new_page) mark_writable(new_page) perform_write(new_page, data) else: perform_write(address, data)

This outlines the conditional copy triggered by writes, with reference count adjustments to maintain sharing integrity. In handling, if allocation of the new copy fails—typically due to insufficient available —the write operation is aborted, and the may signal an to the requesting entity, potentially leading to termination or invocation of broader out-of- recovery procedures rather than fallback to immediate full duplication.

Benefits and Limitations

Advantages

Copy-on-write (CoW) enhances efficiency by permitting multiple processes or entities to share the same physical pages initially, thereby avoiding redundant allocations and minimizing the overall . Only when a write operation occurs on a shared page does the system create a private copy, ensuring that unmodified remains shared across all users. This mechanism is especially advantageous for read-heavy workloads, where the write fraction is low—such as in applications where less than 50% of is modified—leading to substantial reductions in memory usage compared to immediate full copying approaches. The technique delivers performance benefits by accelerating initial operations, which support reads from the common resource without any duplication overhead. By deferring the costly copy process until an actual write is detected, CoW improves system responsiveness and reduces latency in scenarios involving frequent or , as the expensive allocation and copying are postponed. This aligns briefly with lazy allocation principles, where resources are provisioned only as needed. CoW embodies an effective space-time , balancing storage savings with deferred computational costs. Quantitatively, for N sharers of a of original size S and a write of M%, the approximate saved is (1 - M/100) × S × (N-1), since only the modified portions are duplicated per additional sharer. Empirical studies confirm this: in Franz Lisp, a 23% write yields high sharing efficiency, while at ~35% still achieves notable savings relative to full copies. Additionally, CoW enables scalability in multi-user or multi-process environments by supporting efficient , which limits proliferation and lowers aggregate system load through persistent sharing of unchanged data.

Disadvantages and Trade-offs

Copy-on-write mechanisms impose significant overhead on write operations, as modifying shared data necessitates duplicating the affected portions before alteration, which incurs latency from allocation and copying processes. This duplication temporarily doubles memory usage for the involved data structures, potentially straining in systems with limited memory availability. For instance, in contexts, the copy operation during a can amplify this cost, especially for large pages or frequent modifications. Repeated partial copies inherent to copy-on-write can lead to scattered allocations, fostering external fragmentation where free becomes fragmented into non-contiguous blocks that hinder efficient allocation of larger contiguous regions. This fragmentation complicates , as coalescing scattered free space becomes more resource-intensive over time, reducing overall system efficiency. Implementing copy-on-write demands sophisticated tracking, such as counts or copy-on-write flags, to monitor sharing and trigger duplications appropriately, thereby elevating code complexity and maintenance burdens. This added intricacy heightens the risk of bugs, including race conditions in concurrent settings or mishandled sharing states, as evidenced by documented vulnerabilities in operating system kernels over the past decades. pitfalls, such as non-atomic updates leading to incorrect sharing detection, further compound these implementation challenges. Copy-on-write is particularly disadvantageous in write-heavy scenarios, where the frequent copying overhead outweighs read-time benefits, potentially degrading performance compared to immediate full copies. through —assessing read-write ratios and access patterns—is crucial to determine suitability, as aggressive use in mutation-dominated environments can lead to excessive .

Applications in Operating Systems

Virtual Memory Management

In virtual memory management, copy-on-write (CoW) integrates with paging by marking shared physical pages as read-only in the page tables of multiple processes, allowing initial sharing without immediate duplication. When a process attempts to write to such a page, the (MMU) triggers a , which the kernel's page fault handler intercepts to implement CoW: it allocates a new physical page, copies the original content, updates the faulting process's page table to point to the new page with write permissions, and leaves the original page unchanged for other sharers. This mechanism leverages the hardware's protection features to enforce sharing while ensuring isolation upon modification. At the kernel level, CoW relies on entries (PTEs) configured with read-only permissions and on physical pages to track sharing; some systems use dedicated CoW bits in PTEs to flag shared writable mappings, while others, like , achieve the effect through read-only marking and kernel-managed counters in the page struct. Upon a write fault, the handler verifies the sharing status, performs the copy if needed, and propagates updates only to the affected process's mappings without altering others, ensuring consistency across shared regions. In Windows, the kernel manager supports CoW through section objects marked with PAGE_WRITECOPY protection, integrating with its hierarchical s to handle faults similarly. CoW promotes resource conservation by enabling multiple processes to map the same physical pages at startup, deferring allocation until writes occur, which significantly reduces RAM usage in multitasking environments where processes often share or segments without modification. For instance, in systems with frequent creation, this lazy approach minimizes initial , as only modified pages consume additional physical memory. The technique evolved from early implementations like TENEX in the early 1970s, which supported CoW for mapped file pages to enable efficient sharing. It gained prominence in VAX/VMS starting in 1978, where it was used for process creation and library sharing to optimize under hardware constraints of the era. Today, CoW is a standard feature in modern kernels, including since its inception for efficient paging and Windows for sections.

Process Forking and Cloning

In Unix-like operating systems adhering to POSIX standards, the fork() system call creates a new child process by duplicating the parent process's address space using copy-on-write (CoW). Initially, the child shares the parent's physical memory pages, with the page table entries marked as read-only to detect modifications; upon a write attempt by either process, the kernel copies the affected page, allocating private copies for each. This approach ensures the child starts with an identical virtual memory layout without immediate full duplication, optimizing resource use in multitasking environments. The CoW mechanism for fork() emerged in 4.3BSD (1986) as part of advancements in (BSD) Unix, building on earlier paging systems introduced around 1979 but initially lacking efficient duplication. Prior implementations, such as in Version 7, relied on full address space copying, which was costly for larger processes; CoW addressed this by deferring copies until necessary, significantly reducing setup overhead. In practice, this transforms the of memory setup from O(n—where n is the process size—to nearly O(1), as only page tables are duplicated upfront, with actual copying handled lazily via page faults. For example, in scenarios where the child immediately calls exec() to load a new program, minimal or no pages are copied, avoiding unnecessary overhead. Variants of process cloning extend this efficiency in modern kernels. In , the clone() generalizes fork() by allowing fine-grained control over shared resources via flags; without the CLONE_VM flag, it employs CoW for the area (VMA), duplicating page tables while sharing physical pages until writes occur. Similarly, Windows supports CoW through memory protection attributes in process creation and section mappings; the CreateProcess , when using file-backed sections with PAGE_WRITECOPY, enables shared read access that forks private copies on modification, akin to Unix semantics for optimizing multiprocess scenarios. Edge cases during forking highlight CoW's nuances, particularly with shared resources. Shared libraries and memory-mapped files, typically loaded with read-only or shared mappings (e.g., via mmap with MAP_SHARED), remain physically shared across parent and child without triggering copies, as writes are prohibited or redirected to the underlying file. Private mappings (e.g., MAP_PRIVATE) follow standard CoW, copying on write to preserve isolation. The subsequent exec() call disrupts this by unmapping the original address space and loading a new executable, effectively nullifying any pending CoW setup and preventing shared library inheritance from the parent.

Applications in Software Development

Data Structure Optimization

Copy-on-write (CoW) techniques optimize data structures in user-space software by enabling efficient sharing of immutable or shared objects, particularly in collections like arrays, lists, and trees. In persistent data structures, mutations create new versions that share unchanged portions with the original, avoiding full copies and reducing memory overhead. This approach is foundational in paradigms, where data immutability ensures and versioned histories without explicit locking. Seminal work on purely functional data structures highlights how CoW-like sharing allows operations to achieve logarithmic for updates by copying only affected paths. The synergy between CoW and immutability is evident in functional languages, where data structures maintain multiple versions through structural sharing. For instance, in a , an update to a specific node requires copying the path from the root to that node, while unmodified subtrees remain shared across versions. This path-copying mechanism preserves the original tree intact, enabling efficient branching for operations like or versioning in algorithms. Such patterns minimize allocation costs, making them suitable for applications requiring historical data retention without proportional memory growth. Reference-counted buffers exemplify CoW memory patterns for strings and similar sequential data, where multiple references point to a shared buffer until a mutation triggers a private copy. This defers copying until necessary, optimizing for scenarios with frequent reads and infrequent writes, such as string concatenation in libraries. The reference count tracks sharing, ensuring mutations isolate changes without affecting other users. Adoption trends reflect CoW's integration into modern languages for data structure efficiency. In Rust, the Cow<T> type implements clone-on-write semantics, allowing borrowed data to be accessed immutably and cloned lazily only on mutation, thus supporting zero-cost abstractions in generic code. Similarly, Java's CopyOnWriteArrayList, introduced in JDK 5, applies CoW to concurrent collections by replicating the entire array on writes, which eliminates locks for readers and prevents concurrent modification exceptions in high-read environments. These implementations underscore CoW's role in balancing performance and safety in shared data scenarios.

Language and Library Examples

In C++, copy-on-write mechanisms are commonly implemented using , often with smart pointers like std::shared_ptr to manage shared data buffers for efficiency in custom classes such as strings. This approach allows multiple instances to share the underlying data until a modification triggers a private copy, reducing memory allocation for read-only accesses. A prominent example is the Qt library's QString class, which employs implicit sharing with copy-on-write semantics. In this design, QString objects contain a pointer to a shared that includes a reference count; assignment or passing by value performs a shallow copy by incrementing the count, while any write operation checks the count and detaches by copying the data if it exceeds 1. This optimization was particularly beneficial for GUI applications handling frequent string copies without modifications. In Python, immutable types such as tuples facilitate structure sharing across references, effectively providing implicit copy-on-write behavior since "copies" reuse the same memory until an attempt to modify would create a new object. The sys.getrefcount() function reveals these shared references by returning the count of pointers to the object, which is typically higher than expected due to the caller's temporary reference during execution. The library implements explicit copy-on-write for its DataFrame and Series objects, allowing multiple references to share the underlying data until a , at which point a private copy is created to maintain isolation. This enhances performance in workflows with frequent reads and occasional updates. Lisp languages, such as , leverage cons cells for efficient data sharing in lists and trees, allowing substructures to be referenced multiply without duplication, akin to the sharing phase of copy-on-write. A cons cell, representing an with pointers, enables persistent data structures where modifications to one shared part do not propagate unless explicitly intended, supporting patterns. For instance, functions like copy-list create new cons cells for the top-level structure while sharing unchanged tails. In Go, strings are immutable views over byte slices, inherently supporting sharing without copy-on-write since modifications always produce new strings. Slices, however, offer copy-on-write potential through their shared backing arrays; operations like slicing create lightweight views that share data until an or explicit copy reallocates a private buffer. The built-in copy() function facilitates deep copies when needed, ensuring modifications do not affect shared sources.

Applications in Storage Systems

Copy-on-Write File Systems

Copy-on-write (CoW) mechanisms in file systems operate at the block level to enable efficient storage and updates by allowing multiple files or versions to share the same disk blocks until a modification occurs. When a write is initiated, the system allocates new blocks for the modified data, leaving the original blocks intact and updating metadata pointers to reference the new locations. This approach avoids in-place overwrites, which can lead to fragmentation and inconsistency, and instead promotes sequential writes that improve performance on modern storage devices. Implementation of block-level CoW typically relies on advanced metadata structures to track block mappings and changes. For instance, , initiated in 2007, uses a copy-on-write-friendly as its core on-disk data structure, where all metadata and file extents are organized in a self-balancing tree that supports efficient updates without linked leaves or in-place modifications. , developed by starting in 2001 and first released in 2005, employs a similar transactional model but organizes data and metadata into objects within a storage pool, using variable-sized blocks (from 512 bytes to 1 MB) managed via metaslabs for dynamic allocation. In both systems, every write transaction groups changes into : new blocks are written, metadata pointers are updated in the tree or object set, and the old state is discarded only after commitment, ensuring that all writes are effectively . Crash safety is a key benefit of CoW file systems, achieved through atomic block swaps and built-in integrity checks that eliminate the need for traditional journaling. In , transactions are grouped into uberblocks that are updated atomically; upon crash recovery, the system selects the most recent valid uberblock to restore a consistent state, with per-block checksums verifying without additional logging overhead. achieves similar consistency using generation numbers in block headers and checksums (such as CRC32C) to detect corruption during recovery, relying on CoW to prevent partial updates. This design ensures filesystem resilience to power failures or crashes by always maintaining a valid on-disk image. Space management in CoW file systems leverages shared extents for deduplication and handles overcommitment through delayed freeing of blocks. Blocks can be shared across files via , allowing identical data to occupy minimal ; for example, supports inline deduplication by hashing blocks and storing unique copies, while uses extent reference counts in its to enable sharing without explicit dedup tools. Overcommitment arises because is reserved for new blocks during writes but old blocks are freed asynchronously after transaction commit, potentially leading to temporary space exhaustion if not monitored; both systems mitigate this with space maps and quotas to track free space in allocation groups or block groups.

Snapshots and Versioning

In copy-on-write (CoW) filesystems, snapshots are created instantaneously by recording a metadata pointer to the current root of the filesystem's block tree, without copying any data at the time of creation. This approach leverages the CoW mechanism, where the snapshot references the existing blocks, allowing for near-zero initial storage overhead. For example, in ZFS, snapshots capture a read-only, point-in-time image of a dataset or volume using this pointer-based method, enabling rapid creation even for large filesystems. CoW facilitates versioning by retaining previous versions of blocks, allowing users to track and access file history non-destructively. Old block versions remain intact as new writes allocate fresh blocks, preserving the integrity of prior states for rollback or auditing. A prominent example is ZFS's send and receive features, which generate incremental backups by streaming differences between snapshots, transmitting only changed data since the last baseline snapshot to a remote system or file. This supports efficient, space-optimized versioning across storage environments. Snapshots maintain read/write consistency by designating them as read-only, while ongoing modifications to the live filesystem are diverted to new blocks via CoW, ensuring the snapshot reflects an unaltered view of the data at creation time. Reads from the snapshot access the original blocks, unaffected by subsequent changes, which provides reliable without interrupting operations. In ZFS, this separation is enforced through immutable snapshot metadata, preventing any writes from altering the captured state. Despite these benefits, practical limitations arise from storage growth as retained versions accumulate unchanged blocks over time, potentially leading to increased disk usage if snapshots are not managed. Tools like Linux Logical Volume Manager (LVM) snapshots exemplify this, where CoW requires pre-allocating or dynamically extending snapshot storage (typically 10-30% of the origin volume), and exceeding capacity can invalidate the snapshot. In LVM, thick snapshots copy data on modification, growing proportionally to changes, while thin snapshots share a pool but still demand monitoring to avoid overflow.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.