Recent from talks
Nothing was collected or created yet.
Compress (software)
View on WikipediaThis article needs additional citations for verification. (June 2012) |
| compress / uncompress | |
|---|---|
| Original author | Spencer Thomas |
| Initial release | February 1985 |
| Operating system | Unix, Unix-like, IBM i |
| Type | Command |
| compress .Z | |
|---|---|
| Filename extension |
.Z |
| Internet media type |
application/x-compress |
| Developed by | Spencer Thomas |
| Type of format | data compression |
compress is a shell command for compressing data based on the LZW algorithm.[1] uncompress is a companion shell command that restores files to their original state (both content and metadata) from a file created with compress.
Although once popular, compress has fallen out of favor because it uses the patented LZW algorithm. Its use has been replaced by commands such as gzip and bzip2 that use other algorithms and provide better data compression. Compared to gzip at its fastest setting, compress is slightly slower at compression, slightly faster at decompression, and has a significantly lower compression ratio.[2] 1.8 MiB of memory is used to compress the Hutter Prize data, slightly more than gzip at its slowest setting.[3]
compress and uncompress have maintained a presence on Unix and BSD systems and have been ported to IBM i.[4]
compress was standardized in X/Open CAE Specification in 1994,[5] and further in The Open Group Base Specifications, Issue 6 and 7.[6] Linux Standard Base does not require compress.[7]
compress is often excluded from the default installation of a Linux distribution but can be installed from a separate package.[8] compress is available for FreeBSD, OpenBSD, MINIX, Solaris and AIX.
compress is allowed for Point-to-Point Protocol in RFC 1977 and for HTTP/1.1 in RFC 9110, though it is rarely used in modern deployments as the better deflate/gzip is available.
Use
[edit]Files compressed by compress are typically named with extension ".Z" and therefore sometimes called .Z files. The extension derives from the earlier pack program which used extension ".z".
Most tar implementations support compression by piping data through compress when given the -Z command line option.
gunzip can decompress .Z files.[9]
Algorithm
[edit]The LZW algorithm used in compress was patented by Sperry Research Center in 1983. Terry Welch published an IEEE article on the algorithm in 1984,[10] but failed to note that he had applied for a patent on the algorithm. Spencer Thomas of the University of Utah took this article and implemented compress in 1984, without realizing that a patent was pending on the LZW algorithm. The GIF image format also incorporated LZW compression in this way, and Unisys later claimed royalties on implementations of GIF. Joseph M. Orost led the team and worked with Thomas et al. to create the final (4.0) version of compress and published it as free software to the net.sources USENET group in 1985. U.S. patent 4,558,302 was granted in 1985 – making compress unusable without paying royalties to Sperry Research (which later merged into Unisys).
The US LZW patent expired in 2003, so it is now in the public domain in the United States. Today, all LZW patents worldwide are expired (see Graphics Interchange Format#Unisys and LZW patent enforcement).
As of POSIX.1-2024 compress supports the DEFLATE algorithm used in gzip.[11]
File format
[edit]The compressed output consists of bit groups. Each bit group consists of codes with fixed amount of bits (9–16). Each group, except the last group, is aligned to the number of bits per code multiplied by 8 and right padded with zeroes. The last group is aligned to 8 bit octets and padded with zeroes. More information can be found at an issue on the ncompress GitHub repository.[12]
Example:
- Suppose the output has ten 9-bit codes, five 10-bit codes, and thirteen 11-bit codes. There are three groups to output containing 90 bits, 50 bits, and 143 bits of data.
- First group will be 90 bits of data + 54 zero bits of padding in order to be aligned to 72 bits (9 bits × 8).
- Second group will be 50 bits of data + 30 zero bits of padding in order to be aligned to 80 bits (10 bits × 8).
- Third group will be 143 bits of data + 1 zero bit of padding in order to be aligned to 8 bits (since this is the last group in the output).
The existence of padding bits is actually a bug, as LZW does not require any alignment. This bug existed for more than 35 years and was in the original UNIX compress, ncompress, gzip and the Windows port. All application/x-compress files were created using this bug.
Some compress implementations write random bits from uninitialized buffer in paddings. There is no guarantee that the paddings will be zeroes. The decompressor must ignore the values in the paddings for compatibility.
See also
[edit]References
[edit]- ^ Frysinger, Mike. "ncompress: a public domain project". Retrieved 2014-07-30.
Compress is a fast, simple LZW file compressor. Compress does not have the highest compression rate, but it is one of the fastest programs to compress data. Compress is the de facto standard in the UNIX community for compressing files.
- ^ Gommans, Luc. "compression - What's the difference between gzip and compress?". Unix & Linux Stack Exchange.
- ^ "Large Text Compression Benchmark". mattmahoney.net.
compress 4.3d....
- ^ IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM. Retrieved 2020-09-05.
- ^ X/Open CAE Specification Commands and Utilities Issue 4, Version 2 (pdf), 1994, opengroup.org
- ^ – Shell and Utilities Reference, The Single UNIX Specification, Version 3 from The Open Group
- ^ Chapter 17. Commands and Utilities in Linux Standard Base Core Specification 5.0.0, linuxfoundation.org
- ^ ncompress, pkgs.org
- ^ "GNU Gzip". The GNU Operating System and the Free Software Movement. 2023-02-05. Retrieved 2024-04-03.
gunzip can currently decompress files created by gzip, zip, compress or pack. The detection of the input format is automatic.
- ^ Welch, Terry A. (1984). "A technique for high performance data compression" (PDF). IEEE Computer. 17 (6): 8–19. doi:10.1109/MC.1984.1659158. S2CID 2055321.
- ^ "compress". opengroup. Retrieved 2 November 2024.
- ^ "compression with 9 bits don't work · Issue #5 · vapier/ncompress". GitHub. Retrieved 2024-09-17.
External links
[edit]- : compress data – Shell and Utilities Reference, The Single UNIX Specification, Version 5 from The Open Group
- – Version 8 Unix Programmer's Manual
- – FreeBSD General Commands Manual
- – OpenBSD General Commands Manual
- – Solaris 11.4 User Commands Reference Manual
- ncompress - public domain compress/uncompress implementation for POSIX systems
- compress - original Unix compress (in a compress'd archive)
- compress - original Unix compress executable (gzip'd)
- Source Code for compress v4.0 (gzip'd sharchives)
- ZIP File containing a Windows port of the compress utility
- source code to the current version of fcompress.c from compress
- bit groups alignment - Explanation of bit groups alignment.
- lzws - New library and CLI, implemented without legacy code.
- ruby-lzws - Ruby bindings with streaming support.
Compress (software)
View on Grokipediacompress command, processes one or more input files, replacing each with a compressed version bearing the .Z file extension, while preserving original file attributes such as permissions and timestamps when possible; its counterpart, uncompress, restores files to their original form.[1][3] Introduced around 1984, compress became a standard component of Unix systems, leveraging 9- to 16-bit codes to achieve typical compression ratios of 50% or more on text files, though performance varies by data type.[3][4]
The algorithm employed by compress is rooted in the LZ78 method developed by Abraham Lempel and Jacob Ziv in 1978, extended by Terry Welch's LZW in 1984, and further adapted as LZC to monitor compression efficiency by rebuilding the dictionary when ratios degrade.[2] This implementation draws directly from U.S. Patent 4,464,650 (1984) and U.S. Patent 4,558,302 (1985), both assigned to Sperry Corporation, enabling the utility to replace recurring patterns with shorter codes starting from 257 upward.[1] Key options include -b to specify the number of bits per code (defaulting to 12-16 bits depending on the system for optimal portability), -c for output to standard output without altering files, -f to force overwriting, and -v to report compression percentages.[1] Compressed files begin with the magic bytes 1F 9D, identifying the format with MIME type application/x-compress.[3]
Historically, compress emerged during the evolution of Unix at Bell Labs and was integrated into System V releases, serving as a foundational tool for file archiving before the rise of multi-file formats like tar.[3] It was formalized in the X/Open CAE Specification in 1994 and later in POSIX.1 standards, including Issue 6 (2001), Issue 7 (2008), and POSIX.1-2017, ensuring portability across Unix variants.[1] However, its reliance on patented LZW technology led to licensing disputes in the late 1980s and 1990s, prompting the Unix community to phase it out in favor of patent-free alternatives.[2] By the early 1990s, gzip—developed by Jean-loup Gailly and Mark Adler using the DEFLATE algorithm—emerged as its direct successor, offering superior compression ratios without legal encumbrances, though compress remains available in many systems for legacy support.[5][2] The LZW patents expired in 2003, but by then, gzip and tools like bzip2 had become dominant.[2]
History and Development
Origins of LZW and Early Implementations
The Lempel–Ziv–Welch (LZW) algorithm, foundational to the compress software, was invented in 1984 by Abraham Lempel, Jacob Ziv, and Terry Welch as an enhancement to the LZ78 dictionary-based compression method originally proposed by Lempel and Ziv in 1978. Welch, working at Sperry Corporation (later Unisys), refined LZ78 to improve efficiency for practical applications by making the dictionary construction more adaptive and suitable for hardware implementation, addressing limitations in encoding speed and dictionary management. This development was detailed in Welch's seminal paper, which emphasized the algorithm's ability to achieve high compression ratios without prior knowledge of the data, making it ideal for general-purpose use.[6] In the 1980s computing landscape, the motivation for such advancements stemmed from the high costs of data storage and transmission; for instance, hard disk drives were expensive at around $50–$100 per megabyte in 1984, and modem speeds were limited to 300–1200 bits per second, making efficient compression essential for managing growing volumes of text and numeric data in business and scientific environments.[7][8] LZW addressed these challenges by enabling lossless compression that could reduce file sizes by 50–70% on typical text files, thereby lowering storage requirements and accelerating data transfer over limited bandwidth networks. The algorithm's design prioritized simplicity and performance, allowing it to run effectively on contemporary hardware like minicomputers and early workstations.[9] At its core, LZW employs an adaptive dictionary that begins with 256 fixed entries corresponding to the standard 8-bit ASCII characters (codes 0–255), which are output using 9-bit codes initially. As compression proceeds, the dictionary dynamically expands by adding new string entries derived from the input data, with codes extending up to 12 bits to accommodate up to 4096 entries in the 9-bit starting mode, before potentially resetting or clearing the table to manage memory. This variable-length coding scheme, where code lengths increase from 9 to 12 bits as the dictionary fills, optimizes bit usage while maintaining decodeability without transmitting the dictionary itself.[6] Early non-Unix implementations of LZW appeared shortly after Welch's publication, including Spencer W. Thomas's initial compress utility released in July 1984 for Unix-like systems at the University of Utah, which was developed on VAX minicomputers and demonstrated the algorithm's viability for file compression. In 1985, Thom Henderson of System Enhancement Associates incorporated LZW into the ARC archiver for MS-DOS systems, marking one of the first commercial applications and popularizing it in the personal computing and bulletin board system (BBS) communities for archiving multiple files. These implementations highlighted LZW's versatility across platforms, paving the way for broader adoption despite emerging patent issues.[10][2]Integration into Unix Systems
The compress command was developed by Spencer W. Thomas at the University of Utah and first publicly released on July 5, 1984, through the net.sources Usenet newsgroup as version 1.0, implementing the LZW compression algorithm for Unix systems.[10] This initial release quickly gained traction within the Unix community, leading to its integration into the Berkeley Software Distribution (BSD) as a standard utility.[11] Compress was incorporated into 4.3BSD, released in June 1986 by the University of California, Berkeley, marking its formal adoption in a major Unix variant and establishing it as a core tool for file compression in academic and research environments.[12] Its inclusion in BSD facilitated widespread distribution via tape releases and source code sharing, contributing to its ubiquity in Unix-like systems during the late 1980s. By the late 1980s, compress had become a de facto standard across various Unix implementations, including derivatives from academic institutions and commercial vendors, due to its efficiency and simplicity in handling text and binary files.[13] The utility's standardization came with IEEE Std 1003.2-1992 (POSIX.2), which defined compress as an optional command under the X/Open Systems Interface (XSI) extension, ensuring portability across conforming Unix systems. It was also included in AT&T's UNIX System V Release 4 (SVR4) in 1988, broadening its presence in commercial Unix environments and solidifying its role until the early 1990s. However, growing awareness of the LZW algorithm's patent held by Unisys led to efforts to replace compress; the gzip utility, using the patent-free DEFLATE algorithm, emerged in October 1992 as a direct alternative, accelerating the shift away from compress in new Unix distributions by 1993.[14]Patent Controversies and Decline
The LZW compression algorithm, central to the compress utility, became the subject of significant legal contention due to U.S. Patent 4,558,302, issued to Sperry Corporation (later acquired by Unisys) on December 10, 1985, for "High speed data compression and decompression apparatus and method."[15] Although the patent was granted in 1985, Unisys did not actively enforce it until the early 1990s, beginning with licensing demands around 1992 that targeted implementations in software and hardware, including those using LZW for data compression.[16] This enforcement particularly affected free and open-source software distributions, as redistributing LZW-based tools without a license violated patent terms, prompting developers to avoid inclusion to prevent legal risks.[17] The impact on compress was profound, leading to its removal from key open-source projects amid growing awareness of the patent. In 1993, the Free Software Foundation (FSF) explicitly stated in its GNU's Bulletin that it could not distribute a compress-compatible compressor due to the LZW patents, which prohibited implementation in free software without licensing fees.[18] This decision accelerated the shift to patent-free alternatives, most notably gzip, developed in 1992–1993 by Jean-loup Gailly and Mark Adler as a direct replacement using the deflate algorithm, which offered comparable or superior compression without legal encumbrances.[17] Commercial Unix vendors also faced licensing costs, further diminishing compress's viability in favor of royalty-free options. The controversies extended beyond compress to broader applications of LZW, notably in the Graphics Interchange Format (GIF), igniting public backlash in the mid-1990s. Unisys's 1994 licensing announcements for GIF encoders and decoders—requiring fees from software developers—sparked widespread criticism in the open-source community, highlighting the stifling effect of software patents on innovation and accessibility.[19] This led to the rapid development of the Portable Network Graphics (PNG) format in 1995 by an independent working group, which employed the deflate algorithm to provide a patent-unencumbered alternative for lossless image compression, effectively sidelining GIF for new projects.[20] Compress played a pivotal role in early awareness of these issues within Unix and open-source circles, as its widespread use in the 1980s exposed the risks of patent-dependent algorithms in freely distributable tools. Compress reached its peak adoption in the 1980s and early 1990s as a standard Unix utility for file archiving, but the patent enforcement marked the beginning of its decline. By the late 1990s, with gzip and other alternatives dominant, compress was largely phased out from new software distributions, persisting only in legacy systems for compatibility with .Z files.[19] By the 2000s, its use had become negligible outside archival or historical contexts, as the expiration of the LZW patents in 2003–2004 worldwide failed to revive interest amid entrenched successors.[16]Usage and Operation
Command Syntax and Basic Usage
Thecompress utility in Unix-like systems is invoked using the basic syntax compress [options] [file...], where optional flags control behavior and one or more file names are specified for compression using the adaptive Lempel-Ziv (LZW) coding algorithm.[21] By default, it processes the named files individually, replacing each input file with a compressed version bearing the .Z extension while preserving the original file's ownership, modes, and timestamps if the user has sufficient privileges; if no files are specified, it reads from standard input and writes to standard output.[21] The utility does not recursively process directories, treating them as invalid inputs for compression, and it skips files that would not reduce in size unless forced.[21]
Decompression is handled by the companion uncompress command, with the syntax uncompress [options] [file...], which restores files previously compressed by compress.[22] In its default mode, uncompress expects input files to have the .Z suffix, removes this extension upon successful decompression to produce the original file, and prompts for confirmation before overwriting an existing target unless suppression is enabled; like compress, it operates on standard input/output if no files are provided and preserves file attributes where possible.[22]
A related tool, zcat, facilitates viewing the contents of compressed files without modifying them, using the syntax zcat [file...].[23] By default, it decompresses the specified .Z files (appending the extension if absent) and concatenates their contents to standard output, allowing inspection via tools like more or piping; if no files are named or if the operand is -, it processes standard input.[23]
In terms of error handling, both compress and uncompress exit with a status greater than 0 upon encountering issues such as non-existent input files, resulting in no changes to the file system and diagnostic messages written to standard error.[21][22] Insufficient permissions on input files leave them unchanged, with an error status returned (typically 1), while attempts to create output files exceeding the system's {NAME_MAX} length limit also fail without alteration.[21] For zcat, invalid or inaccessible files similarly produce error diagnostics and a non-zero exit status without affecting the originals.[23]
Options and Advanced Features
Thecompress utility provides several command-line options to customize its behavior, allowing users to control output, overwriting, verbosity, and compression parameters. The -f flag forces compression even if it does not reduce file size or if a corresponding .Z file already exists, overwriting without prompting unless running in the background.[1] The -v option enables verbose output, printing the percentage reduction achieved for each file to standard error.[24] Similarly, the -c flag directs compressed output to standard output without modifying input files or creating .Z files, useful for piping or testing without altering originals.[25]
A key advanced feature is the -b bits option, which sets the maximum number of bits per code in the LZW algorithm, ranging from 9 to 16 bits.[24] In the original 4.3BSD implementation, the default is 12 bits, balancing compression ratio and performance on resource-constrained systems.[24] Higher values, such as -b 16, enable better compression ratios for larger files by allowing a larger dictionary of codes, though this increases memory usage and processing time; lower values like -b 9 prioritize speed but yield poorer compression.[26] The specified bits value is embedded in the output file header for compatibility with uncompress.[24]
Directory handling varies by implementation; the POSIX standard does not support recursion, leaving directories unchanged and operating only on named files.[25] However, compress inherently focuses on single-file lossless compression and does not support multi-file archiving or bundling, unlike combinations such as tar with compress; multiple files are handled individually, each producing a separate .Z file.[1] This design emphasizes simplicity over integrated packaging.[25]
Practical Examples and Best Practices
To compress a single text file using thecompress utility, execute the command compress largefile.txt. This replaces the original file with largefile.txt.Z, a compressed version employing the adaptive Lempel-Ziv-Welch (LZW) algorithm, typically reducing the file size by 50-60% for text data with repetitive patterns.[26][27]
For scenarios requiring compressed data transfer without storing an intermediate file locally, pipe the output to a remote host via SSH: compress -c file.txt | ssh user@host cat > remote.txt.Z. The -c option directs the compressed stream to standard output, enabling efficient network transmission while preserving the original file on the source system.[26][27]
Best practices for compress emphasize its strengths with text-based files, where LZW excels by building a dictionary of repeated strings to achieve high compression ratios. Avoid applying it to already-compressed data like images or binaries, as such content lacks redundancy and may result in no size reduction or even slight expansion. For directory archiving, integrate with tar to bundle files before compression: tar cf - dir | compress > archive.tar.Z. This pipeline creates a single compressed archive, archive.tar.Z, suitable for backups or distribution.[28][29][26]
Common pitfalls include memory constraints when processing large files, as the LZW dictionary (controlled via the -b option, defaulting to 16 bits for up to 65,536 entries) can consume significant RAM; reduce the bits value (e.g., -b 12) on systems with limited resources to avoid failures. Additionally, handling symbolic links requires caution, as compress follows links to their targets when processing named files, potentially leading to unexpected results if cycles exist; prefer tar for link preservation in complex directory structures.[26][27]
Technical Specifications
LZW Algorithm Mechanics
The Lempel–Ziv–Welch (LZW) algorithm is a dictionary-based lossless compression method that builds a dynamic code table during encoding and decoding to replace repeated sequences of data with shorter codes. It operates without prior knowledge of the input data's statistics, adapting to patterns as they appear, and ensures both compressor and decompressor maintain identical dictionaries through synchronized updates. The core idea involves scanning the input stream for the longest prefix that matches an existing dictionary entry, outputting its code, and extending the dictionary with new sequences formed by appending the next input symbol.[30] In the LZC variant used by the Unix compress utility, the dictionary is initialized with 256 entries representing single-byte strings (ASCII values), assigned codes 0 to 255. Special codes are reserved: 256 for the clear code (to reset the dictionary) and 257 for end-of-file (EOF). Dynamic dictionary entries begin at code 258. The algorithm reads the input stream character by character, maintaining a current string starting as empty. For each new input symbol , it checks if the concatenation exists in the dictionary. If it does, is updated to ; if not, the code for is output, and a new dictionary entry for is added with the next available code (starting from 258). Then, is reset to . The clear code (256) is output when compression efficiency degrades (e.g., when bits output per input byte exceeds 1), resetting the dictionary to its initial state (codes 0-255) and discarding dynamic entries. This loop continues until the input is exhausted, at which point the code for the final is output, followed by the EOF code (257).[31][32] The following pseudocode outlines the compression loop:initialize [dictionary](/page/Dictionary) with [code](/page/Code)s 0-255 for single bytes
reserve 256 for clear [code](/page/Code), 257 for EOF
w = [empty string](/page/Empty_string)
while input not exhausted:
k = read next input byte
if w + k exists in [dictionary](/page/Dictionary):
w = w + k
else:
output [code](/page/Code) for w
add w + k to [dictionary](/page/Dictionary) with next [code](/page/Code) (from 258)
w = k
monitor [compression ratio](/page/Compression_ratio); if degrades (e.g., bits/byte >1 since last clear):
output clear [code](/page/Code) (256)
reset [dictionary](/page/Dictionary) to 0-255
output [code](/page/Code) for final w
output EOF [code](/page/Code) (257)
initialize [dictionary](/page/Dictionary) with [code](/page/Code)s 0-255 for single bytes
reserve 256 for clear [code](/page/Code), 257 for EOF
w = [empty string](/page/Empty_string)
while input not exhausted:
k = read next input byte
if w + k exists in [dictionary](/page/Dictionary):
w = w + k
else:
output [code](/page/Code) for w
add w + k to [dictionary](/page/Dictionary) with next [code](/page/Code) (from 258)
w = k
monitor [compression ratio](/page/Compression_ratio); if degrades (e.g., bits/byte >1 since last clear):
output clear [code](/page/Code) (256)
reset [dictionary](/page/Dictionary) to 0-255
output [code](/page/Code) for final w
output EOF [code](/page/Code) (257)
initialize dictionary with codes 0-255 for single bytes
reserve 256 for clear, 257 for EOF
read first code c (skip if clear or EOF)
if c == 256: reset dictionary
if c == 257: end
w = dictionary[c]
output w
while true:
read next code c
if c == 257: end // EOF
if c == 256: // clear
reset dictionary to 0-255
continue
if c in dictionary:
entry = dictionary[c]
else:
entry = w + first byte of w
output entry
add w + first byte of entry to dictionary with next code (from 258)
w = entry
initialize dictionary with codes 0-255 for single bytes
reserve 256 for clear, 257 for EOF
read first code c (skip if clear or EOF)
if c == 256: reset dictionary
if c == 257: end
w = dictionary[c]
output w
while true:
read next code c
if c == 257: end // EOF
if c == 256: // clear
reset dictionary to 0-255
continue
if c in dictionary:
entry = dictionary[c]
else:
entry = w + first byte of w
output entry
add w + first byte of entry to dictionary with next code (from 258)
w = entry
File Format Structure
The .Z files produced by the compress utility feature a simple binary structure consisting of a compact header followed by the packed LZW-compressed data stream. The header is typically three bytes long, ensuring compatibility across Unix-like systems while allowing for basic configuration of the compression parameters. The first two bytes form the magic number, set to 0x1F followed by 0x9D, which uniquely identifies the file as LZW-compressed data from the compress tool. This marker enables tools like file or uncompress to detect and process the format correctly. The third byte acts as a combined flags and configuration field. Its most significant bit (bit 7) denotes block mode: when set (value 0x80), it indicates that the compression uses dynamic dictionary resets via the clear code, which is the standard behavior for improving efficiency on varied data. The lower five bits (bits 0-4) specify the maximum code length in bits, with values typically ranging from 0x09 (9 bits) to 0x10 (16 bits); common defaults are 12 or 13 bits for balancing compression ratio and speed. Some implementations include an optional fourth byte for additional flags, though this is rare and not part of the core specification. Following the header, the body contains the variable-length LZW codes packed directly into successive bytes without further delimiters. Codes begin at 9 bits per code and incrementally increase (to 10, 11, etc.) as the dictionary fills, up to the maximum defined in the header, with each code representing either a literal byte (0-255), the clear code (256), the EOF code (257), or a dictionary entry (258+). These codes are written bit-by-bit, aligned to byte boundaries, using little-endian ordering within bytes to minimize overhead; partial bytes at code boundaries are padded as needed to complete 8-bit units. The stream concludes with the end-of-file (EOF) code 257, often preceded by the clear code 256 to reset the dictionary if the compression ratio has degraded. Variants exist across implementations, such as those in BSD-derived systems versus System V Unix, primarily in the third byte's flag interpretation or default maximum code size. For instance, BSD versions (originating from 4.3BSD) consistently enable block mode and support up to 16 bits, while some SysV ports may default to lower maxima or handle flag bits differently for compatibility with older hardware, though the magic number and overall layout remain invariant.Performance and Limitations
The compress utility, employing the LZW algorithm, achieves compression ratios typically ranging from 2:1 to 3:1 for text files with high redundancy, such as English prose or source code, while performing worse on binary data with low repetition, often yielding ratios closer to 1.5:1 or less.[31] These ratios depend heavily on the input's redundancy, as the algorithm builds a dictionary of repeated phrases to substitute shorter codes.[32] Historical benchmarks on English text files demonstrate an average size reduction of approximately 35-50%, with one study reporting compressed sizes around 37% of the original for representative corpora.[31] In terms of speed, compress was optimized for rapid execution on 1980s hardware, processing data faster than subsequent algorithms like those in bzip2 due to its simpler dictionary management and lack of block-based preprocessing.[32] However, it is memory-intensive, requiring up to 512 KB for the dictionary implementation, which stores up to 65,536 entries in its hash table structure.[33] Key limitations include the absence of true streaming support for extremely large files, as the fixed dictionary size prevents ongoing adaptation beyond the maximum capacity, necessitating resets or reduced efficiency for inputs exceeding several megabytes.[31] A phenomenon known as "dictionary explosion" can occur when the dictionary fills with unique, non-repeating phrases, leading to diminished compression ratios in later portions of the data.[33] Additionally, the fixed maximum code size of 16 bits caps the dictionary at 65,536 entries (starting from 9 bits and increasing as needed), after which the algorithm becomes non-adaptive and may output longer codes without further gains.[34]Legacy and Compatibility
Availability in Modern Systems
In modern Linux distributions, the compress utility is available through the ncompress package, which provides the original LZW-based compression and decompression tools compatible with the historical Unix compress program.[35] This package can be installed using package managers such as apt on Debian and Ubuntu derivatives or dnf (successor to yum) on Fedora and Red Hat-based systems.[36] Additionally, utilities like zless and zmore, which support viewing compressed files including those in .Z format, are provided via the gzip package and leverage ncompress for LZW handling.[37] On macOS and BSD variants such as FreeBSD, compress is accessible either as a built-in command or through third-party package managers. FreeBSD includes the compress and uncompress commands natively in its base system for handling .Z files.[12] macOS users can install ncompress via Homebrew, enabling compatibility with legacy .Z files through the command line, as the built-in Archive Utility does not support this format.[38] For Windows, compress functionality is supported in Unix-like environments such as Cygwin, where the ncompress package is available for installation, or Windows Subsystem for Linux (WSL), which mirrors Linux package availability.[39] These ports allow processing of .Z files without native built-in support in Windows. As of 2025, compress remains maintained primarily for backward compatibility with existing .Z archives, though it is not recommended for new compression tasks due to superior alternatives like gzip offering better ratios and patent-free operation.[40]Comparisons with Successor Tools
The gzip utility employs the Deflate algorithm, which combines LZ77 dictionary coding with Huffman entropy encoding, to achieve superior compression ratios compared to the LZW method used by compress, often resulting in files up to 20-30% smaller on general data sets.[41] For instance, in benchmarks on mixed data, gzip at default settings reduces a file to approximately 23 MB, while compress yields 39.5 MB for the same input.[41] Developed as a direct replacement for the patented LZW algorithm, gzip has been patent-free since its release, avoiding the licensing issues that affected compress.[5] Additionally, gzip supports streaming compression through standard output, enabling seamless integration with pipes for real-time processing without creating temporary files.[42] In contrast, bzip2 utilizes a block-sorting transformation (Burrows-Wheeler) followed by Huffman coding, delivering even higher compression efficiency than both compress and gzip, particularly on text-heavy files where ratios can be 30-50% better than compress.[41] Representative benchmarks show bzip2 at default levels compressing the same data to around 19 MB, compared to compress's 39.5 MB, though this comes at the cost of significantly slower processing times—often 5-10 times longer for decompression alone.[41][43] Compress exhibits notable feature limitations relative to its successors, operating solely on single files without native support for multi-file archiving, encryption, or large-file splitting—capabilities that gzip addresses through piping with tools like tar for multi-file handling and optional extensions for advanced features in modern implementations.[42][44]| Tool | Compression Ratio Example (on benchmark data) | Compression Time (s) | Decompression Time (s) | Relative Strengths |
|---|---|---|---|---|
| compress | ~39.5 MB (poorest) | 2.64 (fastest) | 1.60 (moderate) | Speed on low-resource systems |
| gzip | ~23.2 MB (moderate) | 13.2 (balanced) | 1.25 (fastest) | Ratio/speed trade-off, streaming |
| bzip2 | ~18.9 MB (best) | 22.6 (slowest) | 5.38 (slowest) | Superior ratios on text |
