Hubbry Logo
search
logo
BagIt
BagIt
current hub

BagIt

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
BagIt

BagIt is a set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags," which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding checksum. The name, BagIt, is inspired by the "enclose and deposit" method, sometimes referred to as "bag it and tag it."

Bags are ideal for digital content normally kept as a collection of files. They are also well-suited to the export, for archival purposes, of content normally kept in database structures that receiving parties are unlikely to support. Relying on cross-platform (Windows and Unix) filesystem naming conventions, a bag's payload may include any number of directories and sub-directories (folders and sub-folders). A bag can specify payload content indirectly via a "fetch.txt" file that lists URLs for content that can be fetched over the network to complete the bag; simple parallelization (e.g. running 10 instances of Wget) can exploit this feature to transfer large bags very quickly. Benefits of bags include:

BagIt is currently defined in RFC 8493. It defines a simple file naming convention used by the digital curation community for packaging up arbitrary digital content, so that it can be reliably transported via both physical media (hard disk drive, CD-ROM, DVD) and network transfers (FTP, HTTP, rsync, etc.). BagIt is also used for managing the digital preservation of content over time. Discussion about the specification and its future directions takes place on the Digital Curation discussion list.

The BagIt specification is organized around the notion of a "bag." A bag is a named file system directory that minimally contains:

On receipt of a bag, a piece of software can examine the manifest file to make sure that the payload files are present and that their checksums are correct. This allows for accidentally removed or corrupted files to be identified. Below is an example of a minimal bag myfirstbag that encloses two files of payload. The contents of the tag files are included below their filenames.

In this example the payload happens to consist of a Portable Network Graphics image file and an Optical Character Recognition text file. In general the identification and definition of file formats is out of the scope of the BagIt specification; file attributes are likewise out of scope.

The specification allows for several optional tag files (in addition to the manifest). Their character encoding must be identified in bagit.txt, which itself must always be encoded in UTF-8. The specification defines the following optional tag files:

Until version 15, the draft also described how to serialize a bag in an archive file, such as ZIP or TAR. From version 15 on, the serialization is no longer part of the specifications, not because of technical reasons but because of the scope and focus of the specification.

See all
User Avatar
No comments yet.