Recent from talks
Nothing was collected or created yet.
| Zstandard | |
|---|---|
| Original author | Yann Collet |
| Developers | Yann Collet, Nick Terrell, Przemysław Skibiński[1] |
| Initial release | 23 January 2015 |
| Stable release | |
| Repository | |
| Written in | C |
| Operating system | Cross-platform |
| Platform | Portable |
| Type | Data compression |
| License | BSD-3-Clause or GPL-2.0-or-later (dual-licensed) |
| Website | facebook |
Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016.[2][3]
The algorithm was published in 2018 as RFC 8478, which also defines an associated media type "application/zstd", filename extension "zst", and HTTP content encoding "zstd".[4]
Features
[edit]Zstandard was designed to give a compression ratio comparable to that of the DEFLATE algorithm (developed in 1991 and used in the original ZIP and gzip programs), but faster, especially for decompression. It is tunable with compression levels ranging from negative 7 (fastest)[5] to 22 (slowest in compression speed, but best compression ratio).
Starting from version 1.3.2 (October 2017), zstd optionally implements very-long-range search and deduplication (--long, 128 MiB window) similar to rzip or lrzip.[6]
Compression speed can vary by a factor of 20 or more between the fastest and slowest levels, while decompression is uniformly fast, varying by less than 20% between the fastest and slowest levels.[7] The Zstandard command-line has an "adaptive" (--adapt) mode that varies compression level depending on I/O conditions, mainly how fast it can write the output.
Zstd at its maximum compression level gives a compression ratio close to lzma, lzham, and ppmx, and performs better[vague] than lza or bzip2.[improper synthesis?][8][9] Zstandard reaches the current Pareto frontier, as it decompresses faster than any other currently available algorithm with similar or better compression ratio.[as of?][10][11]
Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary. It also offers a training mode, able to generate a dictionary from a set of samples.[12][13] In particular, one dictionary can be loaded to process large sets of files with redundancy between files, but not necessarily within each file, such as for log files.
Design
[edit]Zstandard combines a dictionary-matching stage (LZ77) with a large search window and a fast entropy-coding stage. It uses both Huffman coding (used for entries in the Literals section)[14] and finite-state entropy (FSE) –a fast tabled version of ANS, tANS, used for entries in the Sequences section. Because of the manner in which FSE carries over state between symbols, decompression involves processing symbols within the Sequences section of each block in reverse order (from last to first).
Usage
[edit]| Zstandard | |
|---|---|
| Filename extension |
.zst[15] |
| Internet media type |
application/zstd[15] |
| Magic number | 28 b5 2f fd[15] |
| Type of format | Data compression |
| Standard | RFC 8878 |
| Website | github |
| Zstandard Dictionary | |
|---|---|
| Magic number | 37 a4 30 ec[15] |
| Standard | RFC 8878 |
| Website | github |
The Linux kernel has included Zstandard since November 2017 (version 4.14) as a compression method for the btrfs and squashfs filesystems.[16][17][18]
In 2017, Allan Jude integrated Zstandard into the FreeBSD kernel,[19] and it was subsequently integrated as a compressor option for core dumps (both user programs and kernel panics). It was also used to create a proof-of-concept OpenZFS compression method[7] which was integrated in 2020.[20]
The AWS Redshift and RocksDB databases include support for field compression using Zstandard.[21]
In March 2018, Canonical tested[22] the use of zstd as a deb package compression method by default for the Ubuntu Linux distribution. Compared with xz compression of deb packages, zstd at level 19 decompresses significantly faster, but at the cost of 6% larger package files. Support was added to Debian (and subsequently, Ubuntu) in April 2018 (in version 1.6~rc1).[23][22][24]
Fedora added ZStandard support to RPM in May 2018 (Fedora release 28) and used it for packaging the release in October 2019 (Fedora 31).[25] In Fedora 33, the filesystem is compressed by default with zstd.[26][27]
Arch Linux added support for zstd as a package compression method in October 2019 with the release of the pacman 5.2 package manager[28] and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses zstd -c -T0 --ultra -20 -; the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 14 times faster, decompression memory increased by 50 MiB when using multiple threads, and compression memory increased but scales with the number of threads used.[29][30][31] Arch Linux later also switched to zstd as the default compression algorithm for mkinitcpio initial ramdisk generator.[32]
A full implementation of the algorithm with an option to choose the compression level is used in the .NSZ/.XCZ[33] file formats developed by the homebrew community for the Nintendo Switch hybrid game console.[34] It is also one of many supported compression algorithms in the .RVZ Wii and GameCube disc image file format.
On 15 June 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93, deprecating the previous codec number of 20 as it was implemented in version 6.3.7, released on 1 June.[35][36]
In March 2024, Google Chrome version 123 (and Chromium-based browsers such as Brave or Microsoft Edge) added zstd support in the HTTP header Content-Encoding.[37] In May 2024, Firefox release 126.0 added zstd support in the HTTP header Content-Encoding.[38]
License
[edit]The reference implementation is licensed under the BSD license, published at GitHub.[39] Since version 1.0, published 31 August 2016,[40] it had an additional Grant of Patent Rights.[41]
From version 1.3.1, released 20 August 2017,[42] this patent grant was dropped and the license was changed to a BSD + GPLv2 dual license.[43]
See also
[edit]- LZ4 (compression algorithm) – a fast member of the LZ77 family
- LZFSE – a similar algorithm by Apple used since iOS 9 and OS X 10.11 and made open source on 1 June 2016
- Zlib
- Brotli – also integrated into browsers
- Gzip – one of the most widely used compression tools
References
[edit]- ^ "Contributors to facebook/zstd". github.com. Archived from the original on 27 January 2021. Retrieved 26 January 2021.
- ^ Sergio De Simone (2 September 2016). "Facebook Open-Sources New Compression Algorithm Outperforming Zlib". InfoQ. Archived from the original on 7 October 2021. Retrieved 20 April 2019.
- ^ "Life imitates satire: Facebook touts zlib killer just like Silicon Valley's Pied Piper". The Register. 31 August 2016. Archived from the original on 3 September 2016. Retrieved 6 September 2016.
- ^ Collet, Yann (October 2018). Kucherawy, Murray S. (ed.). Zstandard Compression and the application/zstd Media Type. Internet Engineering Task Force Request for Comments. doi:10.17487/RFC8478. RFC 8478. Retrieved 7 October 2020.
- ^ "Release Zstandard v1.3.4 - faster everything · facebook/zstd". GitHub. Archived from the original on 11 September 2021. Retrieved 27 March 2024.
- ^ "Command Line Interface for Zstandard library". GitHub. 28 October 2021.
- ^ a b "ZStandard in ZFS" (PDF). open-zfs.org. 2017. Archived (PDF) from the original on 18 December 2019. Retrieved 20 April 2019.
- ^ Matt Mahoney. "Silesia Open Source Compression Benchmark". Archived from the original on 21 January 2022. Retrieved 10 May 2019.
- ^ Matt Mahoney (29 August 2016). "Large Text Compression Benchmark, .2157 zstd". Archived from the original on 31 March 2022. Retrieved 1 September 2016.
- ^ TurboBench: Static/Dynamic web content compression benchmark, PowTurbo, archived from the original on 17 March 2022, retrieved 21 March 2018
- ^ Matt Mahoney, Silesia Open Source Compression Benchmark, archived from the original on 21 January 2022, retrieved 5 April 2018
- ^ "Facebook developers report massive speedups and compression ratio improvements when using dictionaries" (PDF). Fermilab. 11 October 2017. Archived (PDF) from the original on 25 January 2018. Retrieved 27 March 2024.
- ^ "Smaller and faster data compression with Zstandard". Facebook. 31 August 2016. Archived from the original on 8 November 2020. Retrieved 3 September 2016.
- ^ "facebook/zstd". GitHub. 28 October 2021.
- ^ a b c d Collet, Yann (February 2021). Kucherawy, Murray S. (ed.). Zstandard Compression and the application/zstd Media Type. Internet Engineering Task Force Request for Comments. doi:10.17487/RFC8878. RFC 8878. Retrieved 26 February 2023.
- ^ Corbet, Jonathan (17 September 2017). "The rest of the 4.14 merge window [LWN.net]". lwn.net. Archived from the original on 22 November 2021. Retrieved 27 March 2024.
- ^ "Linux_4.14 - Linux Kernel Newbies". Kernelnewbies.org. 30 December 2017. Archived from the original on 10 January 2018. Retrieved 16 August 2018.
- ^ Larabel, Michael (8 September 2017). "Zstd Compression For Btrfs & Squashfs Set For Linux 4.14, Already Used Within Facebook - Phoronix". www.phoronix.com. Archived from the original on 25 July 2019. Retrieved 13 November 2017.
- ^ "Integrate ZSTD into the kernel · freebsd/Freebsd-SRC@28ef165". GitHub.
- ^ "Add ZSTD support to ZFS · openzfs/ZFS@10b3c7f". GitHub. Archived from the original on 10 September 2020. Retrieved 12 October 2020.
- ^ "Zstandard Encoding - Amazon Redshift". 20 April 2019. Archived from the original on 14 August 2021. Retrieved 24 January 2018.
- ^ a b Larabel, Michael (12 March 2018). "Canonical Working On Zstd-Compressed Debian Packages For Ubuntu". phoronix.com. Phoronix Media. Archived from the original on 16 August 2021. Retrieved 29 October 2019.
The developers at Canonical are considering a feature freeze exception to get this newly-developed Zstd Apt/Dpkg support in Ubuntu 18.04 LTS. In doing so, they mention they would be looking at enabling Zstd compression for packages by default in Ubuntu 18.10.
- ^ "New Ubuntu Installs Could Be Speed Up by 10% with the Zstd Compression Algorithm". Softpedia. 12 March 2018. Archived from the original on 6 October 2021. Retrieved 13 August 2018.
- ^ "Debian Changelog for apt". Debian. 19 April 2021. Retrieved 7 November 2022.
- ^ "Changes/Switch RPMS to ZSTD compression". Fedora Project Wiki. Archived from the original on 2 June 2019. Retrieved 8 July 2020.
- ^ "Fedora Workstation 34 feature focus: Btrfs transparent compression". Fedora Magazine. 14 April 2021. Retrieved 12 May 2022.
- ^ "Changes/BtrfsTransparentCompression". Fedora Project Wiki. Retrieved 12 May 2022.
- ^ Larabel, Michael (16 October 2019). "Arch Linux Nears Roll-Out of ZSTD Compressed Packages for Faster Pacman Installs". Phoronix. Archived from the original on 18 March 2022. Retrieved 21 October 2019.
- ^ Broda, Mara (4 January 2020). "Now using Zstandard instead of xz for package compression". Arch Linux. Archived from the original on 18 March 2022. Retrieved 5 January 2020.
- ^ Broda, Mara (25 March 2019). "RFC: (devtools) Changing default compression method to zstd". arch-dev-public (Mailing list). Archived from the original on 17 August 2021. Retrieved 5 January 2020.
- ^ Broda, Mara; Polyak, Levente (27 December 2019). "makepkg.conf: change default compression method to zstd". GitHub.
- ^ Razzolini, Giancarlo (19 February 2021). "News: Moving to Zstandard images by default on mkinitcpio". Arch Linux. Retrieved 28 December 2021.
- ^ "RELEASE - nsZip - NSP compressor/decompressor to reduce storage". GBAtemp.net - The Independent Video Game Community. 20 October 2019. Archived from the original on 15 August 2021. Retrieved 3 November 2019.
- ^ Bosshard, Nico (31 October 2019), nsZip is a tool to compress/decompress Nintendo Switch games using the here specified NSZ file format: nicoboss/nsZip, archived from the original on 27 March 2022, retrieved 3 November 2019
- ^ APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.8, 15 June 2020, retrieved 7 July 2020
- ^ APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.7, 1 June 2020, retrieved 6 June 2020
- ^ "New in Chrome 123 | Chrome Blog". Chrome for Developers. 19 March 2024. Retrieved 16 April 2024.
- ^ "Firefox 126.0, See All New Features, Updates and Fixes". Archived from the original on 13 May 2024. Retrieved 15 May 2024.
- ^ "Facebook open sources Zstandard data compression algorithm, aims to replace technology behind Zip". ZDnet. 31 August 2016. Retrieved 1 September 2016.
- ^ "Zstandard v1.0". GitHub. 31 August 2016. Archived from the original on 7 April 2023. Retrieved 23 January 2025.
- ^ "v1.3.0/PATENTS · facebook/zstd". GitHub. 30 August 2016. Archived from the original on 15 May 2021. Retrieved 27 March 2024.
- ^ "Release Zstandard v1.3.1 · facebook/zstd". GitHub. 20 August 2017. Archived from the original on 12 September 2020. Retrieved 27 March 2024.
- ^ "New license by Cyan4973 · Pull Request #801 · facebook/zstd". GitHub. 19 August 2017. Archived from the original on 12 September 2020. Retrieved 27 March 2024.
External links
[edit]- Official website

- zstd on GitHub
- 7zip with Zstandard on GitHub
- "Smaller and faster data compression with Zstandard", Yann Collet and Chip Turner, 31 August 2016, Facebook Announcement
- The Guardian is using ZStandard instead of zlib
History and Development
Origins at Facebook
Zstandard's development was initiated by Yann Collet, a software engineer at Facebook, in early 2015, building on his prior work with fast compression algorithms like LZ4.[8] The project stemmed from Facebook's need to manage exploding data volumes in real-time environments, where traditional tools like gzip and zlib proved inadequate due to their trade-offs between compression speed and ratio.[8][9] Specifically, Zstandard aimed to enable faster processing for high-throughput tasks such as compressing server logs and performing large-scale backups, which often involve terabytes of data daily across Facebook's infrastructure.[9] Early prototypes integrated innovative entropy coding techniques, including Finite State Entropy developed by Collet, and were iteratively refined through internal benchmarks using datasets like the Silesia corpus.[8] These tests compared Zstandard against established algorithms, revealing it achieved zlib-level compression ratios with 3–5 times faster speeds than gzip, while surpassing LZ4 in ratio without sacrificing much decompression velocity.[8] Facebook's core infrastructure team played a pivotal role, providing resources and expertise to tailor the algorithm for production-scale deployment in data storage, transmission, and analytics pipelines.[9] Key early contributors included Collet as the lead designer, alongside team members who focused on hardware optimization for modern CPUs.[8]Releases and Milestones
Zstandard's initial public release occurred on January 23, 2015, as version 0.1, developed by Yann Collet to address needs for fast, high-ratio compression in real-time scenarios.[10] Early versions introduced key features like dictionary support, enabling improved compression ratios for small or similar datasets by leveraging pre-trained dictionaries. The project advanced through beta iterations before its formal open-sourcing on August 31, 2016, with version 1.0.0, which stabilized the compression format and was hosted on GitHub under Facebook's organization. Following open-sourcing, Zstandard benefited from extensive community involvement, with contributions from numerous developers enhancing its portability, performance, and integration capabilities, while maintenance remained under Meta (formerly Facebook). Significant milestones marked the evolution of Zstandard's capabilities. Version 1.3.2, released in October 2017, introduced long-range mode with a 128 MiB search window, allowing better deduplication and compression for large files with repetitive structures. Subsequent releases refined these features, balancing speed and ratio across diverse use cases. As of February 20, 2025, the latest stable release is version 1.5.7, incorporating over 500 commits focused on performance optimizations in compression, decompression, and multi-threading efficiency.Features
Performance Characteristics
Zstandard delivers high performance in both compression and decompression, targeting real-time scenarios with speeds significantly surpassing traditional algorithms like gzip while maintaining comparable or superior compression ratios. Decompression routinely exceeds 500 MB/s on modern hardware, with benchmarks showing up to 1550 MB/s for level 1 compression on a Core i7-9700K processor using the Silesia corpus (as of v1.5.7).[2] This efficiency stems from its design, which prioritizes low-latency decoding suitable for interactive applications.[8] In version 1.5.7 (released February 2025), compression speed at fast levels like level 1 improved notably on small data blocks from the Silesia corpus, for example, from 280 MB/s to 310 MB/s for 4 KB blocks (+10%) and from 383 MB/s to 458 MB/s for 32 KB blocks (+20%). These enhancements benefit use cases in data centers and databases, such as RocksDB with 16 KB blocks.[11] Compression speeds are highly tunable across 22 levels, from fast (level 1) to high-ratio (level 22), enabling trade-offs between throughput and size reduction; negative levels further prioritize speed at the expense of ratio. At level 1, Zstandard achieves a 2.896:1 ratio on the Silesia corpus with 510 MB/s compression speed, while the ultra-fast --fast=3 setting yields a 2.241:1 ratio at 635 MB/s (as of v1.5.7).[2] Compared to zlib (the basis for gzip and ZIP), Zstandard compresses 3-5 times faster at equivalent ratios and can produce 10-15% smaller outputs at the same speed, with higher levels offering up to 20-30% better ratios in certain workloads.[8] These characteristics make it versatile for scenarios requiring rapid processing without excessive resource use. Zstandard scales effectively with hardware, leveraging multi-threading to distribute workload across multiple CPU cores for improved throughput on parallel systems, and incorporating SIMD instructions to accelerate block processing on contemporary processors.[12] Starting with v1.5.7, the command-line interface defaults to multi-threading using up to 4 threads based on system capabilities. This hardware awareness contributes to its decompression consistency, which remains high even under varied conditions. The following table summarizes key performance metrics on the Silesia corpus (Core i7-9700K, Ubuntu 24.04, v1.5.7), contrasting Zstandard with gzip, LZ4, and Brotli at comparable fast settings:| Algorithm | Compression Ratio | Compression Speed (MB/s) | Decompression Speed (MB/s) |
|---|---|---|---|
| Zstd level 1 | 2.896 | 510 | 1550 |
| Zstd --fast=3 | 2.241 | 635 | 1980 |
| LZ4 | 2.101 | 675 | 3850 |
| Gzip level 1 | 2.743 | 105 | 390 |
| Brotli level 0 | 2.702 | 400 | 425 |
Advanced Capabilities
Zstandard provides several advanced features that extend its utility beyond standard compression tasks, enabling efficient handling of specialized scenarios such as small datasets, streaming data, and large-scale processing.[12] Dictionary compression allows Zstandard to use pre-trained dictionaries—typically derived from representative samples of the data domain—to enhance compression ratios for small payloads under 1 MB. This mode is particularly effective for repetitive or similar data streams, such as repeated JSON payloads or log entries, where it can improve compression ratios by 20-50% compared to dictionary-less compression by leveraging shared patterns without rebuilding the dictionary each time. In v1.5.7, dictionary compression saw further gains of up to 15% in speed and ratio for small blocks. The feature supports simple API calls likeZSTD_compress_usingDict for one-off uses or bulk APIs like ZSTD_createCDict for repeated applications, reducing latency after initial dictionary loading.[8][13]
Streaming mode facilitates incremental compression and decompression of unbounded data streams, ideal for large files or real-time applications where full buffering is impractical. It employs structures like ZSTD_CStream for compression and ZSTD_DStream for decompression, processing input in chunks via functions such as ZSTD_compressStream2 (with directives for continuing, flushing, or ending) and ZSTD_decompressStream, which update buffer positions automatically to avoid memory overhead. This enables seamless handling of continuous data flows, such as network transmissions or log processing, without requiring the entire dataset in memory at once.[14]
The very-long-range mode, introduced in version 1.3.2, extends the search window to 128 MiB to capture distant matches in large inputs, yielding better compression ratios for files exceeding typical block sizes. Activated via ZSTD_c_enableLongDistanceMatching or the --long command-line option, it increases memory usage but is beneficial for datasets like backups or genomic sequences where long-range redundancies exist, with the window size scaling up to the frame content if needed.[15]
Multi-threading support enables parallel processing of compression blocks for levels 1 through 19, distributing workload across multiple CPU cores to accelerate throughput on multi-core systems. Configured with parameters like ZSTD_c_nbWorkers (defaulting to 1, but scalable to available cores) and ZSTD_c_overlapLog for thread coordination, it processes independent blocks concurrently while maintaining sequential output, though it elevates memory requirements proportionally to thread count. Decompression remains single-threaded due to its inherent sequential nature.[16]
Legacy support ensures compatibility with older Zstandard formats dating back to version 0.4.0, allowing decompression of legacy frames when enabled at compile time via ZSTD_LEGACY_SUPPORT. This fallback mechanism detects legacy identifiers and handles them transparently in modern builds, facilitating upgrades in environments with mixed-format archives without data loss.[12]
Version 1.5.7 also introduced the --max command-line option for achieving maximum compression ratios beyond level 22, providing finer control for ultra-high compression needs on large datasets like enwik9.[11]
Design
Core Architecture
Zstandard employs a block-based format where compressed files, known as frames, consist of one or more contiguous blocks, each limited to a maximum uncompressed size of 128 KB to facilitate efficient processing and memory management.[4] Each frame begins with a frame header of 2 to 14 bytes, which includes a 4-byte magic number (0xFD2FB528 in little-endian byte order) for identification, a frame header descriptor specifying parameters such as the presence of a dictionary ID, window size descriptor, frame content size, and an optional 4-byte checksum (using XXH64 for integrity verification).[4] The dictionary ID, if present, allows referencing an external dictionary for improved compression on repetitive data, while the window descriptor defines the maximum back-reference distance for matches.[4] Blocks themselves feature a 3-byte header indicating the last block flag, block type (raw, run-length encoded, or compressed), and size, enabling variable compressed sizes while capping uncompressed content at the block maximum.[4] The compression process operates in stages, beginning with block splitting of the input data into segments no larger than 128 KB to balance compression efficiency and resource usage.[17] Within each block, Zstandard applies an LZ77-style dictionary matching algorithm to identify and deduplicate repeated sequences, producing literals (unmatched bytes) and sequences (match length, literal length, and offset triples).[4] Matching employs chained hash tables to probe for potential duplicates, with configurable parameters like hashLog determining the size of the initial hash table (powers of 2 from 64 KB to 256 MB) and chainLog setting the length of hash chains for deeper searches (up to 29 bits, or 512 MB).[17] For enhanced deduplication, binary trees can be integrated in advanced strategies (e.g., btopt or btultra), organizing recent data for logarithmic-time lookups and complementing hash chains to reduce redundancy.[17] These mechanisms prioritize speed and ratio through greedy or optimal parsing, avoiding exhaustive searches. Zstandard utilizes an asymmetric sliding search window, where the compressor references prior data up to a configurable distance (window size, a power of 2 from 1 KB to 3.75 TB), with offsets in sequences pointing backward within this window to reconstruct matches during decompression.[4] The default and recommended minimum window size for interoperability is 8 MB, but long-range mode extends this to at least 128 MiB (windowLog=27) for handling datasets with distant repetitions, increasing memory demands while improving ratios on suitable inputs.[4] Chained hash tables accelerate literal and sequence matching by indexing recent bytes, enabling rapid candidate retrieval without full scans.[17] Decompression proceeds block-by-block in a sequential manner due to inter-block dependencies via the sliding window. The core architecture has remained stable since the RFC 8878 standardization in 2021, with ongoing optimizations in subsequent releases up to version 1.5.7 (as of 2025).[5] Each compressed block decodes its literals and sequences independently, using included or predefined tables for literals, followed by offset-based reconstruction of matches.[4] The process verifies the optional frame checksum upon completion to ensure data integrity. The overall format adheres to RFC 8878, published in 2021, which standardizes frame parameters for consistent implementation across tools and libraries.[4] This architecture integrates with subsequent entropy coding stages to produce the final compressed output.[4]Entropy Coding Mechanisms
Zstandard employs two primary entropy coding mechanisms to achieve efficient compression of literals and sequences: Huffman coding for rapid processing in simpler scenarios and Finite State Entropy (FSE), a table-based implementation of tabled asymmetric numeral systems (tANS), for superior compression ratios in more complex cases.[4][18] Huffman coding is prioritized at lower compression levels for its speed, while FSE is utilized at higher levels to approach the performance of arithmetic coding with reduced computational overhead.[4] These coders operate on probability distributions derived from the input data within each block, enabling adaptive encoding that minimizes redundancy.[4] Literals, which are unmatched bytes in the compression block, are encoded using Huffman trees that can be either static or dynamically constructed based on recent data statistics.[4] The dynamic Huffman trees are built by first counting symbol frequencies in the literals section, then assigning weights inversely proportional to these frequencies (higher weights for more frequent symbols, in range 0-11). Bit lengths are derived as nbBits = 11 - weight (clamped between 0 and 11), with weight 0 indicating unused symbols. Symbols are then sorted by increasing bit length (decreasing weight) and by symbol value for ties to assign canonical prefix codes sequentially. This approach allows Huffman coding to handle literals efficiently, with maximum code lengths capped at 11 bits to balance speed and compression. The resulting tree description is compressed using FSE for transmission to the decoder.[4] Sequences, comprising offsets, match lengths, and literal lengths, are encoded using FSE tables that model their respective probability distributions.[4] For each sequence component, probabilities are normalized to a power-of-2 total (defined by Accuracy_Log, typically 5 to 12), and symbols are distributed across table states using a spreading algorithm to ensure even coverage: starting from an initial position, subsequent placements are offset by(tableSize >> 1) + (tableSize >> 3) + 3 and masked to the table size minus one.[4] FSE decoding proceeds via a state machine where the initial state is seeded with Accuracy_Log bits from the bitstream; subsequent states are updated using precomputed tables that incorporate the read bit values with baselines derived from the probability distribution, enabling sequential symbol recovery without multiplications.[4] This normalization ensures precise probability representation, with the state carried over between symbols for cumulative encoding.[4]
This design yields compression ratios close to the Shannon limit while operating at speeds comparable to Huffman, with FSE's table-driven nature reducing complexity relative to full arithmetic coders.[19][4]
Usage
Command-Line Tool
The Zstandard command-line tool, commonly referred to aszstd, provides a standalone utility for compressing and decompressing files using the Zstandard algorithm, with a syntax designed to be familiar to users of tools like gzip and xz.[12] The basic syntax for compression is zstd [options] <source> [-o <destination>], where <source> specifies the input file or files, and -o optionally sets the output path; if omitted, the output defaults to appending .zst to the input filename.[12] Compression levels range from 1 (fastest, lowest ratio) to 22 (slowest, highest ratio), with a default of 3; faster modes use --fast=N (where N is a positive integer, equivalent to negative levels like -1), while higher ratios can be achieved using the --ultra flag to enable levels 20-22 (maximum), though these require significantly more memory and are recommended only for specific use cases.[12]
Common options enhance flexibility for various workflows. The -d or --decompress flag decompresses .zst files, restoring the original content, while unzstd serves as a convenient alias for single-file decompression.[12] Dictionary training, useful for improving compression on small or similar datasets, is invoked with --train <files> -o <dictionary>, generating a reusable dictionary file that can then be applied via -D <dictionary>.[12] Multi-threading is supported with -T# (e.g., -T0 for automatic detection or -T4 for four threads), accelerating compression on multi-core systems.[12] For handling large inputs, --long=[windowLog] enables long-distance matching, expanding the search window up to 128 MB or more (e.g., --long=24 for a 16 MB window), at the cost of increased memory usage.[12]
Practical examples illustrate typical usage. To compress a file, run zstd file.txt, which produces file.txt.zst; decompression follows with zstd -d file.txt.zst or simply unzstd file.txt.zst.[12] For streaming data, piping is effective: cat file.txt | zstd -c > file.txt.zst compresses input from standard input and outputs to a file (the -c flag ensures output to stdout for further piping).[12] Compressed files use the .zst extension by convention and include optional integrity checks via xxHash, a fast non-cryptographic hash function, enabled with --checksum to verify data integrity post-decompression.[12]
The tool operates on single files or streams by default and does not recurse into directories unless specified with -r; this design prioritizes simplicity for basic tasks, while advanced integrations are available through the Zstandard library.[12]
Library and API Integration
The reference implementation of Zstandard is provided as the C librarylibzstd, typically distributed as a shared object file such as libzstd.so on Unix-like systems, which exposes a comprehensive API for embedding compression and decompression capabilities directly into applications.[3] This library supports both simple block-based operations and advanced streaming modes, enabling efficient handling of data in real-time scenarios without requiring external tools. The API is designed for portability, compiling on various platforms including Windows, Linux, and macOS, and is optimized for modern hardware with multi-threading support via the --enable-mt build flag.[20]
The core API revolves around simple, high-level functions for non-streaming use cases, such as ZSTD_compress(), which takes input data, a compression level (ranging from 1 for fastest to 22 for maximum ratio), and an output buffer to produce compressed data, returning the compressed size or an error code if the operation fails.[21] Complementing this, ZSTD_decompress() performs the reverse, accepting compressed input and decompressing it into a provided buffer, with bounds checked via ZSTD_decompressBound() to ensure safe allocation. For streaming scenarios, where data arrives incrementally, the library uses stateful contexts: compression is managed through ZSTD_CStream objects created with ZSTD_createCStream() and processed via ZSTD_compressStream(), which allows partial input processing and output flushing; decompression follows a similar pattern with ZSTD_DStream. These contexts maintain internal state across calls, supporting dictionary-based compression for improved ratios on similar data sets. Error handling is standardized with return values inspected via ZSTD_isError() and detailed codes from ZSTD_getErrorCode(), covering issues like insufficient output space or corrupted input. Buffer management is explicit, requiring users to pre-allocate input/output buffers, with helper functions like ZSTD_compressBound() estimating maximum output size to prevent overflows.[21]
To facilitate integration beyond C, Zstandard offers official and community-maintained bindings for several popular languages, ensuring consistent performance across ecosystems. The Python binding, python-zstandard, provides a CFFI-based interface mirroring the C API, including classes like ZstdCompressor and ZstdDecompressor for both block and streaming operations, and is recommended for its compatibility with the standard library's compression.zstd module since Python 3.14.[22] For Java, the zstd-jni library wraps libzstd via JNI, offering classes such as ZstdCompressCtx for direct API access and supporting Android environments.[23] In Go, the compress/zstd package from klauspost/compress implements a pure-Go encoder/decoder with streaming support, achieving near-native speeds without CGO dependencies in some modes.[24] Rust bindings are available through the zstd crate, which uses unsafe bindings to libzstd for high performance, including traits for streaming compression and safe buffer handling via std::io. Community wrappers exist for additional languages like .NET and Node.js, but the official ports prioritize these core ones for reliability.[25]
Zstandard's library integration shines in scenarios requiring in-memory compression, such as database storage where PostgreSQL has incorporated it since version 15 for compressing server-side base backups and since version 16 for pg_dump, reducing storage needs for large datasets while supporting efficient queries.[26] Similarly, it serves as an efficient alternative in network protocols, including HTTP/2 implementations where servers like NGINX can use Zstandard modules to compress responses, offering better speed-to-ratio trade-offs than Brotli for dynamic content transfer.
Best practices for API usage emphasize proactive resource management to avoid runtime errors or excessive memory use. Developers should allocate compression contexts using ZSTD_estimateCCtxSize(level) to determine the required heap size based on the desired compression level, ensuring the context fits within application constraints before calling ZSTD_createCCtx(). For partial or streaming frames, always check return values for continuation needs—such as non-zero outputs from ZSTD_compressStream() indicating more data to process—and flush streams explicitly with ZSTD_endStream() to finalize frames. The command-line tool zstd serves as a convenient wrapper around this library for testing and ad-hoc operations.[21]
