Hubbry Logo
search
search button
Sign in
Historyarrow-down
starMorearrow-down
Welcome to the community hub built on top of the List of web archiving file formats Wikipedia article. Here, you can discuss, collect, and organize anything related to List of web archiving file formats. The purpose of the hub is to connect people, foster deeper knowledge, and help improve the root Wikipedia article.
Add your contribution
Inside this hub
List of web archiving file formats

A web archive file is an archive file that contains all resources necessary to display a web page, including the base HTML as well as images, audio, video, CSS, scripts, etc. Some web archive formats can store more than one web page, such as the Mozilla Archive Format.

Known formats

[edit]
Name Filename extension Description
Mozilla Archive Format (MAFF) .maff A legacy, open file format for Firefox[1] used to store one or more web pages with their associated resources into a single ZIP file.[2][3] The Mozilla extension that implements MAFF supports versions of Firefox from 2007 to 2017 but not later, and there are no plans to update it.[4]
Microsoft Compiled HTML Help .chm A legacy, proprietary format originally developed to for online help purposes. It can store multiple web pages, all their associated resources, as well as a table of contents for navigating said web pages, and an index to facilitate searching within the CHM contents. CHM files use the LZX compression scheme. CHM files are sometimes used for e-books; Microsoft Reader's .lit format is a modification of the CHM format.[5]
MHTML .mht, .mhtml A proposed open standard to store a single HTML file as well as all its associated resources.[6] MHTML is in plain text. It uses the Base64 binary-to-text encoding to store binary resources such as images. Most modern web browser based on Chromium support this format.
Web Archive .webarchive The web archive format of the Safari web browser, it can store a single HTML file and its associated resources.
WARC .warc An ISO standard that specifies a method for combining multiple digital resources into an aggregate archive file together with related information. These combined resources are saved as a WARC file which can be replayed on appropriate software, or utilized by archive websites such as the Wayback Machine. WARC is the successor of Internet Archive's ARC_IA File Format that has traditionally been used to store "web crawls" as sequences of content blocks.[7]
WACZ .wacz A ZIP‑based format that bundles one or more WARC files along with indexes, a pages manifest, and a datapackage.json manifest for metadata. It leverages ZIP’s random access to support seeking into archive bundles to read individual files via HTTP Range requests. [8]
EPUB .epub An e-book format, it can store multiple HTML files and their associated resources inside a ZIP file.

References

[edit]
Add your contribution
Related Hubs