Recent from talks
Nothing was collected or created yet.
Versioning file system
View on WikipediaA versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed using methods similar as those for normal file access.
Similar technologies
[edit]Backup
[edit]A versioning file system is similar to a periodic backup, with several key differences.
- Backups are normally triggered on a timed basis, while versioning occurs when the file changes.
- Backups are usually system-wide or partition-wide, while versioning occurs independently on a file-by-file basis.
- Backups are normally written to separate media, while versioning file systems write to the same hard drive (and normally the same folder, directory, or local partition).
In comparison to revision control systems
[edit]Versioning file systems provide some of the features of revision control systems. However, unlike most revision control systems, they are transparent to users, not requiring a separate "commit" step to record a new revision.
Journaling file system
[edit]Versioning file systems should not be confused with journaling file systems. Whereas journaling file systems work by keeping a log of the changes made to a file before committing those changes to that file system (and overwriting the prior version), a versioning file system keeps previous copies of a file when saving new changes. The two features serve different purposes and are not mutually exclusive.
Object storage
[edit]Some object storage implementations offers object versioning, such as Amazon S3.
Implementations
[edit]ITS
[edit]An early implementation of versioning, possibly the first, was in MIT's ITS. In ITS, a filename consisted of two six-character parts; if the second part was numeric (consisted only of digits), it was treated as a version number. When specifying a file to open for read or write, one could supply a second part of ">"; when reading, this meant to open the highest-numbered version of the file; when writing, it meant to increment the highest existing version number and create the new version for writing.
Another early implementation of versioning was in TENEX, which became TOPS-20.[1]
Files-11 (RSX-11 and OpenVMS)
[edit]A powerful example of a file versioning system is built into the RSX-11 and OpenVMS operating system from Digital Equipment Corporation. In essence, whenever an application opens a file for writing, the file system automatically creates a new instance of the file, with a version number appended to the name. Version numbers start at 1 and count upward as new instances of a file are created. When an application opens a file for reading, it can either specify the exact file name including version number, or just the file name without the version number, in which case the most recent instance of the file is opened. The "purge" DCL/CCL command can be used at any time to manage the number of versions in a specific directory. By default, all but the highest numbered versions of all files in the current directory will be deleted; this behavior can be overridden with the /keep=n switch and/or by specifying directory path(s) and/or filename patterns. VMS systems are often scripted to purge user directories on a regular schedule; this is sometimes misconstrued by end-users as a property of the versioning system.
Linux
[edit]- NILFS – A log-structured file system supporting versioning of the entire file system and continuous snapshotting. In this list, this is the only one that is stable and included in the mainline kernel.
- Tux3 – Most recent change was in 2014.[2]
- Next3 – Most recent update was in 2012.
- ext3cow – Most recent release was in 2005.
On February 8, 2004, Kiran-Kumar Muniswamy-Reddy, Charles P. Wright, Andrew Himmer, and Erez Zadok (all from Stony Brook University) proposed a stackable file system Versionfs, providing a versioning layer on top of any other Linux file systems.[3]
LMFS
[edit]The Lisp Machine File System supports versioning. This was provided by implementations from MIT, LMI, Symbolics and Texas Instruments. Such an operating system was Symbolics Genera.
macOS
[edit]Starting with Lion (10.7), macOS has a feature called Versions which allows Time Machine-like saving and browsing of past versions of documents for applications written to use Versions. This functionality, however, takes place at the application layer, not the filesystem layer;[4] Lion and later releases do not incorporate a true versioning file system.
SCO OpenServer
[edit]HTFS, adopted as the primary filesystem for SCO OpenServer in 1995, supports file versioning. Versioning is enabled on a per-directory basis by setting the directory's setuid bit, which is inherited when subdirectories are created. If versioning is enabled, a new file version is created when a file or directory is removed, or when an existing file is opened with truncation. Non-current versions remain in the filesystem namespace, under the name of the original file but with a suffix attached consisting of a semicolon and version sequence number. All but the current version are hidden from directory reads (unless the SHOWVERSIONS environment variable is set), but versions are otherwise accessible for all normal operations. The environment variable and general accessibility allow versions to be managed with the usual filesystem utilities, though there is also an "undelete" command that can be used to purge and restore files, enable and disable versioning on directories, etc.
Others
[edit]- Subversion has a feature called "autoversioning" where a WebDAV source with a subversion backend can be mounted as a file system on systems that support this kind of mount (Linux, Windows and others do) and saves to that file system generate new revisions on the revision control system.[5]
- The commercial Clearcase configuration management and revision control software has also supported "MVFS" (multi version file system) on HP-UX, AIX and Windows since the early 1990s.
Related software
[edit]The following are not versioning filesystems, but allow similar functionality.
- APFS[6] and ZFS support instantaneous snapshots and clones.
- Btrfs supports snapshots.[7]
- HAMMER in DragonFlyBSD has the ability to store revisions in the filesystem.
- NILFS, which supports snapshotting.
- Plan 9's Fossil file system can provide a similar feature, taking periodic snapshots (often hourly) and making them available in /n/snap. Fossil can forever archive a snapshot into Venti (usually one snapshot each day) and make them available in /n/dump. If multiple changes are made to a file during the interval between snapshots, only the most recent will be recorded in the next snapshot.
- Write Anywhere File Layout - NetApp's storage solutions implement a file system called WAFL, which uses snapshot technology to keep different versions of all files in a volume around.
- pdumpfs, authored by Satoru Takabayashi, is a simple daily backup system similar to Plan 9's /n/dump, implemented in Ruby. It functions as a snapshotting tool, which makes it possible to copy a whole directory to another location by using hardlinks. Used regularly, this can produce an effect similar to versioning.[8]
- Microsoft Windows
- Shadow Copy - is a feature introduced by Microsoft with Windows Server 2003. Shadow Copy allows for taking manual or automatic backup copies or snapshots of a file or folder on a specific volume at a specific point in time.
- RollBack Rx - Allows snapshots of disk partitions to be taken. Each snapshot contains only the differences between previous snapshots, and take only seconds to create. Can be reliably used to keep a Windows OS stable and/or protected from malware.
- GoBack (discontinued) - The GoBack software for Windows from Symantec enables reversion of files, directories or disks to previous states. It can record a maximum of 8GB in changes, and temporarily stops recording each change in the event of high I/O activity.
- Versomatic - Versomatic software by Acertant automatically tracks file changes and preemptively archives a copy of a file before it is modified.
- Cascade File System exposes a Subversion or Perforce repository via a file system driver. The user must still explicitly decide when to commit changes.
- git implementation documents call git a "content addressable filesystem with a VCS user interface written on top of it."[9] There's also a 3rd-party FUSE implementation exists that may extend git as a mountable, read-write versioning filesystem.[10]
See also
[edit]References
[edit]- ^ Daniel G. Bobrow, Jerry D. Burchfiel, Daniel L. Murphy, Raymond S. Tomlinson, TENEX, A Paged Time Sharing System for the PDP-10 (Communications of the ACM, Vol. 15, pp. 135-143, March 1972)
- ^ linux-tux3 on GitHub.
- ^ Kiran-Kumar Muniswamy-Reddy; Charles P. Wright; Andrew Himmer; Erez Zadok (8 February 2004). A Versatile and User-Oriented Versioning File System. Third USENIX Conference on File and Storage Technologies (FAST 2004).
- ^ "Mac OS X Lion file versions, part 2". 6 August 2011. Retrieved 28 April 2012.
- ^ Version Control with Subversion: Next Generation Open Source Version Control
- ^ "About Apple File System". Apple Developer Documentation. Retrieved 2021-06-09.
- ^ http://www.oracle.com/technetwork/articles/servers-storage-admin/advanced-btrfs-1734952.html Snapshots, Clones, and Seed Devices" "snapshots" sub bullet.
- ^ pDumpFS Homepage
- ^ "Git Internals".
Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.
- ^ "What is Gitfs". Presslabs. 24 July 2015. Retrieved 2022-03-07.
External links
[edit]- "How to make a file storage". WikiHow. Retrieved Jul 31, 2018.
Versioning file system
View on Grokipediafilename.txt;1, filename.txt;2) to allow direct access to specific revisions.[3][4] Subsequent research in the 1990s and 2000s produced prototypes such as the Elephant file system and the Comprehensive Versioning File System (CVFS), which emphasized fine-grained versioning at the write level and optimized metadata structures like journal-based inodes and multiversion B-trees to reduce storage overhead by up to 99% for directories while enabling long-term retention for security forensics.[1] User-oriented systems like Versionfs, a stackable layer introduced in 2004, extended versioning to any underlying file system with configurable retention policies (e.g., time-based or space-limited), achieving low performance overhead of 1-4% for typical workloads through sparse or compressed storage.[2] Similarly, the Wayback system, a FUSE-based user-level implementation for Linux from 2004, logs every write operation to create undoable histories, offering fine-grained access to versions dating back to file creation, though at a higher space cost (20-30 times that of tools like RCS).[5]
In contemporary computing, advanced file systems integrate versioning-like features through snapshot mechanisms, which capture point-in-time copies of entire datasets efficiently. For instance, ZFS, developed by Sun Microsystems in 2001 and now part of OpenZFS, uses copy-on-write to create instantaneous read-only snapshots, allowing users to revert files or directories to prior states without duplicating data, thus supporting rapid recovery and incremental backups.[6] Btrfs, initiated by Oracle in 2007 as a next-generation Linux file system, employs subvolumes and snapshots for similar purposes, enabling features like automatic rollback, quota management, and data integrity checks via checksums, which collectively enhance reliability in enterprise and cloud environments.[7] Despite these advances, challenges persist, including metadata bloat in comprehensive schemes and the need for policy-based pruning to manage storage growth, as highlighted in studies showing up to 80% space savings through optimized structures.[1]
Introduction
Definition
A versioning file system is a type of computer file system designed to automatically retain multiple versions of files each time they are modified, thereby enabling users to access and restore previous file states without relying on manual backups or external tools.[1] This approach addresses common issues such as accidental deletions, overwrites, or data corruption by preserving a complete history of changes directly within the file system structure.[8] Key characteristics of versioning file systems include the automatic creation of new versions triggered by write operations or attribute modifications, ensuring that changes are captured transparently without altering application behavior.[1] These systems persistently store all versions in a manner that supports efficient space usage, often through techniques like copy-on-write, and provide mechanisms for users to query and access the version history of individual files as well as directories.[8] The versioning applies at a fine-grained level, typically per-file, allowing selective retrieval of historical data while maintaining standard file system semantics.[9] In contrast to point-in-time snapshots, which capture the entire file system state at discrete intervals and may miss intermediate changes, versioning file systems maintain all versions concurrently addressable, supporting continuous and granular access to any prior modification. Some modern systems integrate versioning with snapshot mechanisms for hybrid recovery options.[1][9] Versions in such systems are commonly identified using numerical suffixes appended to the file name, such as foo;1 in systems like OpenVMS or foo;f1 in Versionfs for a full copy of the first version, or alternatively by timestamps to indicate creation time.[10] This naming convention facilitates direct access to specific versions through standard file system interfaces.[8]Basic Principles
In versioning file systems, the fundamental principle of non-destructive writes ensures that every modification to a file generates a new version while preserving prior versions according to configurable retention policies, which may include automatic pruning to manage storage, preventing accidental data loss from overwrites.[1] This approach allows users to maintain a history of file changes, enabling recovery to any previous state as needed.[2] Directory versioning extends this principle to structural changes, where operations like renames or deletions propagate updates across affected version histories without erasing existing records, thereby sustaining referential integrity for all files involved.[2] For instance, in the VMS file system, such changes update directory entries and mark deleted files for retention until purging, ensuring versions remain accessible.[4] Users typically interact with versions through intuitive commands integrated into the file system interface. In VMS, the DIR/FULL command displays comprehensive details for all versions of a file, while specific versions can be selected by appending a version number to the filename (e.g., filename.ext;5), and the PURGE command allows manual removal of outdated versions to reclaim space.[4] These operations provide straightforward access without requiring specialized tools. To manage storage growth, versioning file systems employ retention policies that automatically or manually limit versions based on criteria such as maximum count per file (e.g., 10–100 versions), time since creation (e.g., 2–5 days), or allocated space thresholds (e.g., 140 KB maximum).[2] In VMS, policies use parameters like minimum and maximum retention periods tied to access or creation times, with purging triggered when limits are exceeded to balance history preservation and disk usage.[4]History
Early Developments
The early developments of versioning file systems arose within time-sharing operating systems of the 1960s and 1970s, primarily to address the challenges of multi-user access in collaborative academic and research environments, where simultaneous file modifications risked permanent data loss from overwrites. These innovations enabled users to retain and access prior file states automatically, fostering safer shared computing without manual backups. The pioneering implementation appeared in the Incompatible Timesharing System (ITS), an operating system developed at the Massachusetts Institute of Technology's Artificial Intelligence Laboratory starting in 1967 for the PDP-6 computer and later ported to the PDP-10. In ITS, files incorporated version numbers directly in their names, formatted as a base name followed by a space and the version (e.g., "FOO 24"), allowing multiple iterations to persist on disk. Reading operations could target the highest version with "FOO >" or the lowest with "FOO <", while writing to an existing file via "FOO >" generated a new sequential version, such as "FOO 25". This approach supported the lab's hacker culture by minimizing disruptions in experimental workflows and enabling quick reversion to stable states.[11] Subsequent advancements built on ITS concepts in commercial systems from Digital Equipment Corporation (DEC). The RSX-11 real-time operating system, initially released in 1972 as a PDP-11 adaptation of the earlier RSX-15, incorporated the Files-11 file system with automatic versioning using octal numbers from 0 to 77777; new files began at version 1, and modifications incremented the number to preserve history.[12] This feature catered to research and industrial applications requiring reliable multi-user file handling on minicomputers. By 1977, DEC's Virtual Memory System (VMS)—later known as OpenVMS—refined Files-11 further, standardizing version delimiters as a semicolon followed by the number (e.g., "DATA.TXT;3"), which incremented on saves to prevent overwrites in enterprise-scale collaborative settings.[13] These evolutions from ITS influenced subsequent file management practices in research computing.Modern Evolution
During the 1990s and early 2000s, research prototypes advanced versioning concepts with efficient mechanisms for fine-grained history retention. The Elephant file system, presented in 1999, automatically retained all important file versions using heuristics to discard less relevant ones, applying versioning to both files and directories for user error recovery.[14] Building on this, the Comprehensive Versioning File System (CVFS), introduced in 2003, provided exhaustive versioning of all file modifications with space-efficient metadata structures like journal-based inodes and multiversion B-trees, achieving up to 99% storage savings for directory metadata while supporting security forensics.[1] These systems emphasized comprehensive, transparent versioning without full data duplication. Versioning file systems gained limited adoption in Unix-like environments, particularly through the High Throughput File System (HTFS) integrated into SCO OpenServer starting in 1995. HTFS enabled file versioning on a per-directory basis, allowing users to retain and access multiple versions of files for recovery purposes, such as undeleting inadvertently modified or removed data.[15] This feature was configurable system-wide or per filesystem, marking an early commercial implementation in enterprise-oriented Unix variants, though it did not extend to mainstream Linux distributions during this period.[16] From the mid-2000s onward, the paradigm shifted toward snapshot-based approximations of versioning, prioritizing efficiency and scalability over traditional per-file version retention. The ZFS file system, developed by Sun Microsystems and released in 2005 as part of OpenSolaris, introduced instantaneous, read-only snapshots that capture point-in-time states of entire datasets, facilitating versioning-like rollback and cloning without the overhead of full copies. Similarly, Btrfs, initiated by Oracle in 2007 and merged into the Linux kernel in 2009, incorporated subvolume snapshots and copy-on-write mechanisms to enable efficient versioning behaviors, such as incremental backups and data integrity checks across large-scale storage pools. These advancements integrated versioning concepts with modern demands for fault tolerance and multi-device support, influencing open-source storage ecosystems. In the 2010s and up to 2025, operating system vendors focused on embedding snapshot capabilities into core filesystems rather than developing new pure versioning systems. Apple's APFS, launched in 2017 with macOS High Sierra, natively supports snapshots that power Time Machine's local backups, automatically retaining hourly point-in-time copies of the startup disk for up to 24 hours to aid quick recovery without external drives.[17] On the Windows side, Microsoft's ReFS, introduced in Windows Server 2012 and iteratively enhanced through versions like those in Windows Server 2022, emphasizes resilience via features such as checksum-based integrity streams, block cloning for deduplication, and repair capabilities, providing indirect versioning support in high-availability enterprise scenarios.[18] As of 2025, pure versioning file systems remain uncommon in consumer operating systems owing to their inherent complexity, including challenges in metadata management, storage efficiency, and compatibility with legacy applications. Instead, snapshot-enhanced filesystems like ZFS and Btrfs dominate enterprise storage arrays and cloud infrastructures, where they deliver scalable data protection and recovery at the volume level, underscoring a trend toward hybrid approaches over exhaustive per-file histories.[19]Technical Mechanisms
Version Creation
In versioning file systems, new versions are typically triggered by file system operations that modify the state of a file or directory, ensuring that prior states are preserved non-destructively. Common triggers include write operations to file contents, renames that alter file metadata, and deletes that remove entries while retaining the affected data. For instance, in systems like Wayback, each write to a file automatically generates a new version at the write level, while directory operations such as mkdir, unlink, or rename also initiate versioning to capture changes atomically. Similarly, Btrfs employs copy-on-write mechanisms to create new versions in response to writes, renames, and deletes, propagating modifications through the file system's tree structures without overwriting existing data.[5][20] A key efficiency technique in version creation is copy-on-write (COW), which avoids full file duplication by sharing unchanged data blocks across versions and only allocating new storage for modified portions. When a write occurs, the file system identifies and copies only the affected blocks to fresh locations, updating pointers in the metadata to reference the new data while leaving the original blocks intact for previous versions. This approach, utilized in Btrfs, involves creating new extent and page versions that ripple upward to the subvolume tree roots, enabling efficient snapshotting and cloning. In the Comprehensive Versioning File System (CVFS), COW integrates with a log-structured layout to further minimize overhead, sharing data blocks across versions to reduce storage costs.[20][21] Version numbering schemes provide unique identifiers for distinguishing between iterations, often using sequential integers, timestamps, or a combination to maintain order and facilitate access. Early systems like OpenVMS employ sequential decimal integers starting from 1 for new files, incrementing by 1 on each save (up to 32,767), appended to the filename (e.g., file.txt;1, file.txt;2) to denote revisions without timestamps. Modern implementations, such as Wayback, combine sequential change numbers (with 1 as the most recent) and timestamps for each version, allowing users to reference specific points in time. In Btrfs, versions are tracked via generation numbers in metadata nodes, aligned with checkpoint serial numbers to ensure temporal consistency across the file system tree. Handling version collisions, which can arise in concurrent or distributed environments, typically involves timestamp-based resolution or unique keys (e.g., user-key/timestamp tuples in multiversion B-trees) to prevent overwrites.[22][5][20][21] Atomicity in version creation ensures that the generation of a new version is an indivisible operation, preventing partial or inconsistent states during modifications. This is achieved through techniques like journaling or COW propagation, where all changes—from data blocks to metadata updates—are committed as a single unit. In CVFS, journal entries for metadata operations enable atomic roll-forward or roll-back, maintaining consistency even if a system crash occurs mid-version. Btrfs guarantees atomicity by batching COW updates into periodic checkpoints (every 30 seconds), with fsync operations using dedicated log-trees for file-specific atomic flushes. These mechanisms collectively safeguard the integrity of version transitions, aligning with the core principle of non-destructive writes in versioning systems.[21][20]Storage and Access
Versioning file systems store file versions persistently using space-efficient models that balance completeness and optimization. Full copy models create complete duplicates of files for each version, ensuring independent access but consuming significant storage. In contrast, delta or block-level differencing models store only changes between versions, often leveraging shared unchanged blocks to minimize redundancy. For example, the Versionfs system implements full mode for exact copies, compressed mode for gzipped full copies, and sparse mode for block-level deltas via sparse files, achieving space savings of up to 74% in compressed modes for certain workloads.[23] These storage approaches often integrate copy-on-write techniques, where modifications to a file create new blocks while preserving originals for prior versions. Block-level differencing is particularly effective in systems like ZFS, where snapshots initially share all data blocks with the active filesystem, with space usage growing only as changes accumulate post-snapshot. Similarly, Btrfs employs copy-on-write to share file extents across snapshots and subvolumes, enabling efficient persistent storage without immediate duplication.[24][25] Metadata management in versioning file systems maintains the integrity and navigability of version histories through structured linkages. Version chains link successive versions linearly, facilitating quick traversal from current to historical states, while tree structures support branching for parallel histories. In the Versionfs implementation, metadata files track version numbers, timestamps, and storage modes in a chain per file, enabling O(1) lookups for version details. Audit trails in versioning systems further enhance this by forming chains of cryptographic authenticators, each verifying the transition to the next version and ensuring tamper-evident history. The SolFS system uses versioned inode chains to manage operation logs across file versions, supporting precise historical reconstruction.[23][26][27] Access to stored versions is facilitated by specialized commands, APIs, or transparent interfaces that allow querying, restoration, and manipulation without altering user workflows. Users can query versions by number, date, or attributes using APIs like ioctls in Versionfs's libversionfs, which supports operations such as version-set statistics and recovery to a specific state. Restoration typically involves rolling back to a chosen version, as in ZFS'szfs rollback command, which reverts a dataset to a snapshot's state while preserving later versions if needed. Branching creates writable clones from versions, enabling divergent histories; Btrfs achieves this via btrfs subvolume snapshot to fork subvolumes, with shared extents until modifications occur. Transparent access notations, such as appending ;N to filenames in Versionfs, allow standard tools to read historical versions directly.[23][24][25]
Purging and retention mechanisms ensure long-term manageability by automatically deleting obsolete versions based on configurable policies. Common algorithms enforce limits on version count, age, or space usage, prioritizing recent or frequently accessed versions for retention. In Versionfs, a background cleaner daemon applies policies such as minimum/maximum versions (e.g., 10–100 per set), retention times (e.g., 2–5 days), and space thresholds (e.g., 140 KB per set), deleting the oldest compliant versions first. Snapshot-based systems like ZFS support policy-driven retention through scheduled creation and expiration, where snapshots are automatically removed after a defined period to free space. Btrfs tools such as btrbk implement retention via hourly/daily/weekly/monthly schedules, preserving a fixed number per interval (e.g., 24 hourlies, 7 dailies) while purging excess to maintain quotas. These policies prevent unbounded growth, with space reclamation occurring via reference counting on shared blocks.[23][24][25]
