Hubbry Logo
search
logo
Bzip2
Bzip2
current hub
2304406

Bzip2

logo
Community Hub0 Subscribers
2304406

Bzip2

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Bzip2

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handling multiple files, and other tools for encryption, and archive splitting.

bzip2 was initially released in 1996 by Julian Seward. It compresses most files more effectively than older LZW and Deflate compression algorithms but is slower. bzip2 is particularly efficient for text data, and decompression is relatively fast. The algorithm uses several layers of compression techniques, such as run-length encoding (RLE), Burrows–Wheeler transform (BWT), move-to-front transform (MTF), and Huffman coding. bzip2 compresses data in blocks between 100 and 900 kB and uses the Burrows–Wheeler transform to convert frequently recurring character sequences into strings of identical letters. The move-to-front transform and Huffman coding are then applied. The compression performance is asymmetric, with decompression being faster than compression.

The algorithm has gone through multiple maintainers since its initial release, with Micah Snyder being the maintainer since June 2021. There have been some modifications to the algorithm, such as pbzip2, which uses multi-threading to improve compression speed on multi-CPU and multi-core computers.

bzip2 is suitable for use in big data applications with cluster computing frameworks like Hadoop and Apache Spark, as a compressed block can be decompressed without having to process earlier blocks.

The bundled bzip2recover utility tries recovering readable parts of damaged bzip2 data. It works by searching for individual blocks and dumping them into separate files.

Seward made the first public release of bzip2, version 0.15, in July 1996. The compressor's stability and popularity grew over the next several years, and Seward released version 1.0 in late 2000.[not verified in body] Following a nine-year hiatus of updates for the project since 2010, on 4 June 2019 Federico Mena accepted maintainership of the bzip2 project. Since June 2021, the maintainer is Micah Snyder.

bzip2 uses several layers of compression techniques stacked on top of each other, which occur in the following order during compression and the reverse order during decompression:

Any sequence of 4 to 255 consecutive duplicate symbols is replaced by the first 4 symbols and a repeat length between 0 and 251. Thus the sequence AAAAAAABBBBCCCD is replaced with AAAA\3BBBB\0CCCD, where \3 and \0 represent byte values 3 and 0 respectively. Runs of symbols are always transformed after 4 consecutive symbols, even if the run-length is set to zero, to keep the transformation reversible.

See all
User Avatar
No comments yet.