Hubbry Logo
Compress (software)Compress (software)Main
Open search
Compress (software)
Community hub
Compress (software)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Compress (software)
Compress (software)
from Wikipedia
compress / uncompress
Original authorSpencer Thomas
Initial releaseFebruary 1985; 41 years ago (1985-02)
Operating systemUnix, Unix-like, IBM i
TypeCommand
compress .Z
Filename extension
.Z
Internet media type
application/x-compress
Developed bySpencer Thomas
Type of formatdata compression

compress is a shell command for compressing data based on the LZW algorithm.[1] uncompress is a companion shell command that restores files to their original state (both content and metadata) from a file created with compress.

Although once popular, compress has fallen out of favor because it uses the patented LZW algorithm. Its use has been replaced by commands such as gzip and bzip2 that use other algorithms and provide better data compression. Compared to gzip at its fastest setting, compress is slightly slower at compression, slightly faster at decompression, and has a significantly lower compression ratio.[2] 1.8 MiB of memory is used to compress the Hutter Prize data, slightly more than gzip at its slowest setting.[3]

compress and uncompress have maintained a presence on Unix and BSD systems and have been ported to IBM i.[4]

compress was standardized in X/Open CAE Specification in 1994,[5] and further in The Open Group Base Specifications, Issue 6 and 7.[6] Linux Standard Base does not require compress.[7]

compress is often excluded from the default installation of a Linux distribution but can be installed from a separate package.[8] compress is available for FreeBSD, OpenBSD, MINIX, Solaris and AIX.

compress is allowed for Point-to-Point Protocol in RFC 1977 and for HTTP/1.1 in RFC 9110, though it is rarely used in modern deployments as the better deflate/gzip is available.

Use

[edit]

Files compressed by compress are typically named with extension ".Z" and therefore sometimes called .Z files. The extension derives from the earlier pack program which used extension ".z".

Most tar implementations support compression by piping data through compress when given the -Z command line option.

gunzip can decompress .Z files.[9]

Algorithm

[edit]

The LZW algorithm used in compress was patented by Sperry Research Center in 1983. Terry Welch published an IEEE article on the algorithm in 1984,[10] but failed to note that he had applied for a patent on the algorithm. Spencer Thomas of the University of Utah took this article and implemented compress in 1984, without realizing that a patent was pending on the LZW algorithm. The GIF image format also incorporated LZW compression in this way, and Unisys later claimed royalties on implementations of GIF. Joseph M. Orost led the team and worked with Thomas et al. to create the final (4.0) version of compress and published it as free software to the net.sources USENET group in 1985. U.S. patent 4,558,302 was granted in 1985 – making compress unusable without paying royalties to Sperry Research (which later merged into Unisys).

The US LZW patent expired in 2003, so it is now in the public domain in the United States. Today, all LZW patents worldwide are expired (see Graphics Interchange Format#Unisys and LZW patent enforcement).

As of POSIX.1-2024 compress supports the DEFLATE algorithm used in gzip.[11]

File format

[edit]

The compressed output consists of bit groups. Each bit group consists of codes with fixed amount of bits (9–16). Each group, except the last group, is aligned to the number of bits per code multiplied by 8 and right padded with zeroes. The last group is aligned to 8 bit octets and padded with zeroes. More information can be found at an issue on the ncompress GitHub repository.[12]

Example:

Suppose the output has ten 9-bit codes, five 10-bit codes, and thirteen 11-bit codes. There are three groups to output containing 90 bits, 50 bits, and 143 bits of data.
  • First group will be 90 bits of data + 54 zero bits of padding in order to be aligned to 72 bits (9 bits × 8).
  • Second group will be 50 bits of data + 30 zero bits of padding in order to be aligned to 80 bits (10 bits × 8).
  • Third group will be 143 bits of data + 1 zero bit of padding in order to be aligned to 8 bits (since this is the last group in the output).

The existence of padding bits is actually a bug, as LZW does not require any alignment. This bug existed for more than 35 years and was in the original UNIX compress, ncompress, gzip and the Windows port. All application/x-compress files were created using this bug.

Some compress implementations write random bits from uninitialized buffer in paddings. There is no guarantee that the paddings will be zeroes. The decompressor must ignore the values in the paddings for compatibility.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Compress is a command-line utility in Unix-like operating systems designed to reduce the size of files through lossless compression using an adaptive variant of the Lempel-Ziv-Welch (LZW) algorithm, specifically LZC, which builds and maintains a dictionary of repeated substrings to encode data efficiently. The tool, invoked via the compress command, processes one or more input files, replacing each with a compressed version bearing the .Z file extension, while preserving original file attributes such as permissions and timestamps when possible; its counterpart, uncompress, restores files to their original form. Introduced around 1984, compress became a standard component of Unix systems, leveraging 9- to 16-bit codes to achieve typical compression ratios of 50% or more on text files, though performance varies by data type. The algorithm employed by compress is rooted in the LZ78 method developed by Abraham Lempel and in 1978, extended by Terry Welch's LZW in 1984, and further adapted as LZC to monitor compression efficiency by rebuilding the when ratios degrade. This implementation draws directly from U.S. Patent 4,464,650 (1984) and U.S. Patent 4,558,302 (1985), both assigned to , enabling the utility to replace recurring patterns with shorter codes starting from 257 upward. Key options include -b to specify the number of bits per code (defaulting to 12-16 bits depending on the system for optimal portability), -c for output to standard output without altering files, -f to force overwriting, and -v to report compression percentages. Compressed files begin with the magic bytes 1F 9D, identifying the format with type application/x-compress. Historically, compress emerged during the evolution of Unix at Bell Labs and was integrated into System V releases, serving as a foundational tool for file archiving before the rise of multi-file formats like tar. It was formalized in the X/Open CAE Specification in 1994 and later in POSIX.1 standards, including Issue 6 (2001), Issue 7 (2008), and POSIX.1-2017, ensuring portability across Unix variants. However, its reliance on patented LZW technology led to licensing disputes in the late 1980s and 1990s, prompting the Unix community to phase it out in favor of patent-free alternatives. By the early 1990s, gzip—developed by Jean-loup Gailly and Mark Adler using the DEFLATE algorithm—emerged as its direct successor, offering superior compression ratios without legal encumbrances, though compress remains available in many systems for legacy support. The LZW patents expired in 2003, but by then, gzip and tools like bzip2 had become dominant.

History and Development

Origins of LZW and Early Implementations

The (LZW) algorithm, foundational to the compress software, was invented in 1984 by , , and Terry Welch as an enhancement to the LZ78 dictionary-based compression method originally proposed by Lempel and Ziv in 1978. Welch, working at (later ), refined LZ78 to improve efficiency for practical applications by making the dictionary construction more adaptive and suitable for hardware implementation, addressing limitations in encoding speed and dictionary management. This development was detailed in Welch's seminal paper, which emphasized the algorithm's ability to achieve high compression ratios without prior of the , making it ideal for general-purpose use. In the 1980s computing landscape, the motivation for such advancements stemmed from the high costs of data storage and transmission; for instance, hard disk drives were expensive at around $50–$100 per megabyte in 1984, and modem speeds were limited to 300–1200 bits per second, making efficient compression essential for managing growing volumes of text and numeric data in business and scientific environments. LZW addressed these challenges by enabling lossless compression that could reduce file sizes by 50–70% on typical text files, thereby lowering storage requirements and accelerating data transfer over limited bandwidth networks. The algorithm's design prioritized simplicity and performance, allowing it to run effectively on contemporary hardware like minicomputers and early workstations. At its core, LZW employs an adaptive that begins with 256 fixed entries corresponding to the standard 8-bit ASCII characters (codes 0–255), which are output using 9-bit codes initially. As compression proceeds, the dictionary dynamically expands by adding new entries derived from the input data, with codes extending up to 12 bits to accommodate up to 4096 entries in the 9-bit starting mode, before potentially resetting or clearing the table to manage . This variable-length coding scheme, where code lengths increase from 9 to 12 bits as the dictionary fills, optimizes bit usage while maintaining decodeability without transmitting the dictionary itself. Early non-Unix implementations of LZW appeared shortly after Welch's publication, including Spencer W. Thomas's initial compress utility released in July 1984 for Unix-like systems at the University of Utah, which was developed on VAX minicomputers and demonstrated the algorithm's viability for file compression. In 1985, Thom Henderson of System Enhancement Associates incorporated LZW into the ARC archiver for MS-DOS systems, marking one of the first commercial applications and popularizing it in the personal computing and bulletin board system (BBS) communities for archiving multiple files. These implementations highlighted LZW's versatility across platforms, paving the way for broader adoption despite emerging patent issues.

Integration into Unix Systems

The compress command was developed by Spencer W. Thomas at the and first publicly released on July 5, 1984, through the net.sources Usenet newsgroup as version 1.0, implementing the LZW compression algorithm for Unix systems. This initial release quickly gained traction within the Unix community, leading to its integration into the Berkeley Software Distribution (BSD) as a standard utility. Compress was incorporated into 4.3BSD, released in June 1986 by the , marking its formal adoption in a major Unix variant and establishing it as a core tool for file compression in academic and environments. Its inclusion in BSD facilitated widespread distribution via tape releases and sharing, contributing to its ubiquity in systems during the late . By the late , compress had become a across various Unix implementations, including derivatives from academic institutions and commercial vendors, due to its efficiency and simplicity in handling text and binary files. The utility's standardization came with IEEE Std 1003.2-1992 (POSIX.2), which defined compress as an optional command under the X/Open Systems Interface (XSI) extension, ensuring portability across conforming Unix systems. It was also included in AT&T's Release 4 (SVR4) in 1988, broadening its presence in commercial Unix environments and solidifying its role until the early . However, growing awareness of the LZW algorithm's patent held by led to efforts to replace compress; the utility, using the patent-free algorithm, emerged in October 1992 as a direct alternative, accelerating the shift away from compress in new Unix distributions by 1993.

Patent Controversies and Decline

The LZW compression algorithm, central to the compress utility, became the subject of significant legal contention due to U.S. 4,558,302, issued to (later acquired by ) on December 10, 1985, for "High speed data compression and decompression apparatus and method." Although the patent was granted in 1985, Unisys did not actively enforce it until the early , beginning with licensing demands around that targeted implementations in software and hardware, including those using LZW for data compression. This enforcement particularly affected distributions, as redistributing LZW-based tools without a violated patent terms, prompting developers to avoid inclusion to prevent legal risks. The impact on compress was profound, leading to its removal from key open-source projects amid growing awareness of the patent. In 1993, the (FSF) explicitly stated in its GNU's Bulletin that it could not distribute a compress-compatible due to the LZW patents, which prohibited implementation in without licensing fees. This decision accelerated the shift to patent-free alternatives, most notably , developed in 1992–1993 by Jean-loup Gailly and as a direct replacement using the algorithm, which offered comparable or superior compression without legal encumbrances. Commercial Unix vendors also faced licensing costs, further diminishing compress's viability in favor of royalty-free options. The controversies extended beyond compress to broader applications of LZW, notably in the Graphics Interchange Format (GIF), igniting public backlash in the mid-1990s. Unisys's 1994 licensing announcements for GIF encoders and decoders—requiring fees from software developers—sparked widespread criticism in the open-source community, highlighting the stifling effect of software patents on and . This led to the rapid development of the Portable Network Graphics (PNG) format in 1995 by an independent , which employed the algorithm to provide a patent-unencumbered alternative for lossless , effectively sidelining GIF for new projects. Compress played a pivotal role in early awareness of these issues within Unix and open-source circles, as its widespread use in the 1980s exposed the risks of patent-dependent algorithms in freely distributable tools. Compress reached its peak adoption in the and early 1990s as a standard Unix utility for file archiving, but the patent enforcement marked the beginning of its decline. By the late , with gzip and other alternatives dominant, compress was largely phased out from new software distributions, persisting only in legacy systems for compatibility with .Z files. By the , its use had become negligible outside archival or historical contexts, as the expiration of the LZW patents in 2003–2004 worldwide failed to revive interest amid entrenched successors.

Usage and Operation

Command Syntax and Basic Usage

The compress utility in systems is invoked using the basic syntax compress [options] [file...], where optional flags control behavior and one or more file names are specified for compression using the adaptive Lempel-Ziv (LZW) coding algorithm. By default, it processes the named files individually, replacing each input file with a compressed version bearing the .Z extension while preserving the original file's ownership, modes, and timestamps if the user has sufficient privileges; if no files are specified, it reads from standard input and writes to standard output. The utility does not recursively process directories, treating them as invalid inputs for compression, and it skips files that would not reduce in size unless forced. Decompression is handled by the companion uncompress command, with the syntax uncompress [options] [file...], which restores files previously compressed by compress. In its default mode, uncompress expects input files to have the .Z suffix, removes this extension upon successful decompression to produce the original file, and prompts for confirmation before overwriting an existing target unless suppression is enabled; like compress, it operates on standard input/output if no files are provided and preserves file attributes where possible. A related tool, zcat, facilitates viewing the contents of compressed files without modifying them, using the syntax zcat [file...]. By default, it decompresses the specified .Z files (appending the extension if absent) and concatenates their contents to standard output, allowing inspection via tools like more or piping; if no files are named or if the operand is -, it processes standard input. In terms of error handling, both compress and uncompress exit with a status greater than 0 upon encountering issues such as non-existent input files, resulting in no changes to the and diagnostic messages written to . Insufficient permissions on input files leave them unchanged, with an error status returned (typically 1), while attempts to create output files exceeding the system's {NAME_MAX} limit also fail without alteration. For zcat, invalid or inaccessible files similarly produce error diagnostics and a non-zero without affecting the originals.

Options and Advanced Features

The compress utility provides several command-line options to customize its behavior, allowing users to control output, overwriting, , and compression parameters. The -f flag forces compression even if it does not reduce or if a corresponding .Z file already exists, overwriting without prompting unless running in the background. The -v option enables verbose output, printing the percentage reduction achieved for each file to . Similarly, the -c flag directs compressed output to standard output without modifying input files or creating .Z files, useful for or testing without altering originals. A key advanced feature is the -b bits option, which sets the maximum number of bits per code in the LZW algorithm, ranging from 9 to 16 bits. In the original 4.3BSD implementation, the default is 12 bits, balancing and on resource-constrained systems. Higher values, such as -b 16, enable better compression ratios for larger files by allowing a larger dictionary of codes, though this increases memory usage and processing time; lower values like -b 9 prioritize speed but yield poorer compression. The specified bits value is embedded in the output file header for compatibility with uncompress. Directory handling varies by implementation; the standard does not support , leaving directories unchanged and operating only on named files. However, compress inherently focuses on single-file and does not support multi-file archiving or bundling, unlike combinations such as tar with compress; multiple files are handled individually, each producing a separate .Z file. This design emphasizes simplicity over integrated packaging.

Practical Examples and Best Practices

To compress a single text file using the compress utility, execute the command compress largefile.txt. This replaces the original file with largefile.txt.Z, a compressed version employing the adaptive Lempel-Ziv-Welch (LZW) algorithm, typically reducing the file size by 50-60% for text data with repetitive patterns. For scenarios requiring compressed data transfer without storing an intermediate file locally, pipe the output to a remote host via SSH: compress -c file.txt | ssh user@host cat > remote.txt.Z. The -c option directs the compressed stream to standard output, enabling efficient network transmission while preserving the original file on the source system. Best practices for compress emphasize its strengths with text-based files, where LZW excels by building a dictionary of repeated strings to achieve high compression ratios. Avoid applying it to already-compressed data like images or binaries, as such content lacks redundancy and may result in no size reduction or even slight expansion. For directory archiving, integrate with tar to bundle files before compression: tar cf - dir | compress > archive.tar.Z. This pipeline creates a single compressed archive, archive.tar.Z, suitable for backups or distribution. Common pitfalls include memory constraints when processing large files, as the LZW dictionary (controlled via the -b option, defaulting to 16 bits for up to 65,536 entries) can consume significant RAM; reduce the bits value (e.g., -b 12) on systems with limited resources to avoid failures. Additionally, handling symbolic links requires caution, as compress follows links to their targets when processing named files, potentially leading to unexpected results if cycles exist; prefer tar for link preservation in complex directory structures.

Technical Specifications

LZW Algorithm Mechanics

The (LZW) algorithm is a dictionary-based method that builds a dynamic code table during encoding and decoding to replace repeated sequences of with shorter codes. It operates without prior knowledge of the input data's statistics, adapting to patterns as they appear, and ensures both and decompressor maintain identical through synchronized updates. The core idea involves scanning the input stream for the longest prefix that matches an existing dictionary entry, outputting its code, and extending the dictionary with new sequences formed by appending the next input symbol. In the LZC variant used by the Unix compress utility, the dictionary is initialized with 256 entries representing single-byte strings (ASCII values), assigned codes 0 to 255. Special codes are reserved: 256 for the clear code (to reset the dictionary) and 257 for (EOF). Dynamic dictionary entries begin at code 258. The algorithm reads the input stream character by character, maintaining a current string ww starting as empty. For each new input symbol kk, it checks if the wkw k exists in the dictionary. If it does, ww is updated to wkw k; if not, the code for ww is output, and a new dictionary entry for wkw k is added with the next available code (starting from 258). Then, ww is reset to kk. The clear code (256) is output when compression efficiency degrades (e.g., when bits output per input byte exceeds 1), resetting the dictionary to its initial state (codes 0-255) and discarding dynamic entries. This loop continues until the input is exhausted, at which point the code for the final ww is output, followed by the EOF code (257). The following outlines the compression loop:

initialize [dictionary](/page/Dictionary) with [code](/page/Code)s 0-255 for single bytes reserve 256 for clear [code](/page/Code), 257 for EOF w = [empty string](/page/Empty_string) while input not exhausted: k = read next input byte if w + k exists in [dictionary](/page/Dictionary): w = w + k else: output [code](/page/Code) for w add w + k to [dictionary](/page/Dictionary) with next [code](/page/Code) (from 258) w = k monitor [compression ratio](/page/Compression_ratio); if degrades (e.g., bits/byte >1 since last clear): output clear [code](/page/Code) (256) reset [dictionary](/page/Dictionary) to 0-255 output [code](/page/Code) for final w output EOF [code](/page/Code) (257)

initialize [dictionary](/page/Dictionary) with [code](/page/Code)s 0-255 for single bytes reserve 256 for clear [code](/page/Code), 257 for EOF w = [empty string](/page/Empty_string) while input not exhausted: k = read next input byte if w + k exists in [dictionary](/page/Dictionary): w = w + k else: output [code](/page/Code) for w add w + k to [dictionary](/page/Dictionary) with next [code](/page/Code) (from 258) w = k monitor [compression ratio](/page/Compression_ratio); if degrades (e.g., bits/byte >1 since last clear): output clear [code](/page/Code) (256) reset [dictionary](/page/Dictionary) to 0-255 output [code](/page/Code) for final w output EOF [code](/page/Code) (257)

This greedy approach ensures that the dictionary grows only with observed sequences, promoting efficiency on repetitive data. Decompression mirrors compression by reconstructing the dictionary on-the-fly from the sequence of output codes, without needing the original input. It starts with the same initial dictionary (codes 0-255 for single bytes), with 256 clear and 257 EOF reserved. The first code received (≥258 or literal) is looked up and output as the current string ww. For subsequent codes cc, if c==257c == 257 (EOF), stop. If c==256c == 256 (clear), reset dictionary to initial state. Otherwise, if cc exists in the dictionary, the corresponding string is output and set as the new ww; a new entry is then added consisting of the previous ww concatenated with the first byte of the current string. If cc does not exist (special case where the new sequence is the previous string plus its own first byte), the output is that constructed string, and it is added to the dictionary. This symmetric building ensures bit-for-bit reconstruction of the original data. The decompression pseudocode is as follows:

initialize dictionary with codes 0-255 for single bytes reserve 256 for clear, 257 for EOF read first code c (skip if clear or EOF) if c == 256: reset dictionary if c == 257: end w = dictionary[c] output w while true: read next code c if c == 257: end // EOF if c == 256: // clear reset dictionary to 0-255 continue if c in dictionary: entry = dictionary[c] else: entry = w + first byte of w output entry add w + first byte of entry to dictionary with next code (from 258) w = entry

initialize dictionary with codes 0-255 for single bytes reserve 256 for clear, 257 for EOF read first code c (skip if clear or EOF) if c == 256: reset dictionary if c == 257: end w = dictionary[c] output w while true: read next code c if c == 257: end // EOF if c == 256: // clear reset dictionary to 0-255 continue if c in dictionary: entry = dictionary[c] else: entry = w + first byte of w output entry add w + first byte of entry to dictionary with next code (from 258) w = entry

Upon dictionary overflow for the current code width, the bit length increases independently, allowing continued compression. The clear code enables adaptation without fixed size limits beyond the maximum code width. In the specific implementation employed by the Unix compress utility, codes begin at 9 bits (supporting up to 511 total codes: 0-255 literals, 256 clear, 257 EOF, 258-511 dynamic), dynamically increasing to 10 bits after code 512, 11 bits after , and 12 bits after 2048, with a configurable maximum up to 16 bits. This variable-length encoding optimizes space by using shorter codes early when the dictionary is small, transitioning seamlessly as redundancy patterns emerge.

File Format Structure

The .Z files produced by the compress utility feature a simple binary structure consisting of a compact header followed by the packed LZW-compressed . The header is typically three bytes long, ensuring compatibility across systems while allowing for basic configuration of the compression parameters. The first two bytes form the magic number, set to 0x1F followed by 0x9D, which uniquely identifies the file as LZW-compressed from the compress tool. This marker enables tools like file or uncompress to detect and process the format correctly. The third byte acts as a combined flags and configuration field. Its most significant bit (bit 7) denotes block mode: when set (value 0x80), it indicates that the compression uses dynamic resets via the clear code, which is the standard behavior for improving efficiency on varied data. The lower five bits (bits 0-4) specify the maximum length in bits, with values typically ranging from 0x09 (9 bits) to 0x10 (16 bits); common defaults are 12 or 13 bits for balancing and speed. Some implementations include an optional fourth byte for additional flags, though this is rare and not part of the core specification. Following the header, the body contains the variable-length LZW codes packed directly into successive bytes without further delimiters. Codes begin at 9 bits per code and incrementally increase (to 10, 11, etc.) as the dictionary fills, up to the maximum defined in the header, with each code representing either a literal byte (0-255), the clear code (256), the EOF code (257), or a dictionary entry (258+). These codes are written bit-by-bit, aligned to byte boundaries, using little-endian ordering within bytes to minimize overhead; partial bytes at code boundaries are padded as needed to complete 8-bit units. The stream concludes with the (EOF) code 257, often preceded by the clear code 256 to reset the dictionary if the compression ratio has degraded. Variants exist across implementations, such as those in BSD-derived systems versus System V Unix, primarily in the third byte's flag interpretation or default maximum code size. For instance, BSD versions (originating from 4.3BSD) consistently enable block mode and support up to 16 bits, while some SysV ports may default to lower maxima or handle flag bits differently for compatibility with older hardware, though the magic number and overall layout remain invariant.

Performance and Limitations

The compress utility, employing the LZW algorithm, achieves compression ratios typically ranging from 2:1 to 3:1 for text files with high , such as English or , while performing worse on with low repetition, often yielding ratios closer to 1.5:1 or less. These ratios depend heavily on the input's , as the algorithm builds a of repeated phrases to substitute shorter codes. Historical benchmarks on English text files demonstrate an average size reduction of approximately 35-50%, with one study reporting compressed sizes around 37% of the original for representative corpora. In terms of speed, compress was optimized for rapid execution on 1980s hardware, processing data faster than subsequent algorithms like those in due to its simpler dictionary management and lack of block-based preprocessing. However, it is memory-intensive, requiring up to 512 KB for the dictionary implementation, which stores up to 65,536 entries in its structure. Key limitations include the absence of true streaming support for extremely large files, as the fixed dictionary size prevents ongoing beyond the maximum capacity, necessitating resets or reduced efficiency for inputs exceeding several megabytes. A phenomenon known as "dictionary explosion" can occur when the fills with unique, non-repeating phrases, leading to diminished compression ratios in later portions of the data. Additionally, the fixed maximum code size of 16 bits caps the at entries (starting from 9 bits and increasing as needed), after which the algorithm becomes non-adaptive and may output longer codes without further gains.

Legacy and Compatibility

Availability in Modern Systems

In modern Linux distributions, the compress utility is available through the ncompress package, which provides the original LZW-based compression and decompression tools compatible with the historical Unix compress program. This package can be installed using package managers such as apt on Debian and Ubuntu derivatives or dnf (successor to yum) on Fedora and Red Hat-based systems. Additionally, utilities like zless and zmore, which support viewing compressed files including those in .Z format, are provided via the gzip package and leverage ncompress for LZW handling. On macOS and BSD variants such as , compress is accessible either as a built-in command or through third-party package managers. FreeBSD includes the compress and uncompress commands natively in its base system for handling .Z files. macOS users can install ncompress via Homebrew, enabling compatibility with legacy .Z files through the command line, as the built-in Archive Utility does not support this format. For Windows, compress functionality is supported in Unix-like environments such as , where the ncompress package is available for installation, or (WSL), which mirrors package availability. These ports allow processing of .Z files without native built-in support in Windows. As of 2025, compress remains maintained primarily for with existing .Z archives, though it is not recommended for new compression tasks due to superior alternatives like offering better ratios and patent-free operation.

Comparisons with Successor Tools

The gzip utility employs the algorithm, which combines LZ77 dictionary coding with Huffman entropy encoding, to achieve superior compression ratios compared to the LZW method used by compress, often resulting in files up to 20-30% smaller on general data sets. For instance, in benchmarks on mixed data, at default settings reduces a file to approximately 23 MB, while compress yields 39.5 MB for the same input. Developed as a direct replacement for the patented LZW algorithm, has been patent-free since its release, avoiding the licensing issues that affected compress. Additionally, supports streaming compression through standard output, enabling seamless integration with pipes for real-time processing without creating temporary files. In contrast, utilizes a block-sorting transformation (Burrows-Wheeler) followed by , delivering even higher compression efficiency than both compress and , particularly on text-heavy files where ratios can be 30-50% better than compress. Representative benchmarks show at default levels compressing the same data to around 19 MB, compared to compress's 39.5 MB, though this comes at the cost of significantly slower processing times—often 5-10 times longer for decompression alone. Compress exhibits notable feature limitations relative to its successors, operating solely on single files without native support for multi-file archiving, , or large-file splitting—capabilities that addresses through with tools like for multi-file handling and optional extensions for advanced features in modern implementations.
ToolCompression Ratio Example (on benchmark data)Compression Time (s)Decompression Time (s)Relative Strengths
compress~39.5 MB (poorest)2.64 (fastest)1.60 (moderate)Speed on low-resource systems
~23.2 MB (moderate)13.2 (balanced)1.25 (fastest)Ratio/speed trade-off, streaming
~18.9 MB (best)22.6 (slowest)5.38 (slowest)Superior ratios on text
Overall benchmarks indicate that while compress excels in speed and low memory usage on legacy hardware—completing compression in under 3 seconds versus over 10 for —its ratios and decompression performance lag on modern CPUs, where optimized implementations of and provide better efficiency for most workloads.

Influence on Other Formats and Software

The (LZW) algorithm, as implemented in the Unix compress utility, significantly influenced the adoption of dictionary-based compression in early image and document formats. In the Tagged Image File Format (TIFF), LZW was introduced as a option for raster images in the mid-1980s, enabling efficient storage of data without quality degradation, as documented in the format's specifications for subtypes like TIFF Bitmap with LZW Compression. The (GIF), developed by in 1987, mandated LZW for compressing indexed-color images, which facilitated its rapid proliferation in online graphics and animations during the early era. Adobe's Level 2, released in 1990, incorporated LZW as a supported filter for compressing embedded images, improving the portability and file size of vector-based documents in printing workflows. Beyond formats, the compress utility directly inspired archiving software on personal computing platforms. The ARC archiver, introduced by System Enhancement Associates in , employed a modified LZW algorithm to achieve superior compression ratios for files, dominating systems (BBS) until the late 1980s. This legacy extended to , created by in 1989, which initially optimized LZW for cross-platform archiving before evolving to the method in version 2.0, thereby establishing the ZIP format's foundational compression techniques. LZW's integration into embedded systems like the operating system further demonstrates its enduring software legacy, where it powered compressed archives and image handling in resource-constrained environments throughout the and . On a broader scale, the algorithm's success catalyzed open research into LZ-family variants, influencing the development of LZ77-based methods that underpin web standards such as compression in images and for HTTP transfers. Today, LZW appears sparingly in legacy forensics and emulation tools, where it is essential for decompressing historical .Z files or early formats like in digital investigations and retro computing simulations, though its use has diminished since the expiration of related patents in 2003 in favor of royalty-free alternatives.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.