Hubbry Logo
ZstdZstdMain
Open search
Zstd
Community hub
Zstd
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Zstd
Zstd
from Wikipedia

Zstandard
Original authorYann Collet
DevelopersYann Collet, Nick Terrell, Przemysław Skibiński[1]
Initial release23 January 2015 (2015-01-23)
Stable release
1.5.7 Edit this on Wikidata / 20 February 2025; 7 months ago (20 February 2025)
Repository
Written inC
Operating systemCross-platform
PlatformPortable
TypeData compression
LicenseBSD-3-Clause or GPL-2.0-or-later (dual-licensed)
Websitefacebook.github.io/zstd/ Edit this on Wikidata

Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016.[2][3]

The algorithm was published in 2018 as RFC 8478, which also defines an associated media type "application/zstd", filename extension "zst", and HTTP content encoding "zstd".[4]

Features

[edit]

Zstandard was designed to give a compression ratio comparable to that of the DEFLATE algorithm (developed in 1991 and used in the original ZIP and gzip programs), but faster, especially for decompression. It is tunable with compression levels ranging from negative 7 (fastest)[5] to 22 (slowest in compression speed, but best compression ratio).

Starting from version 1.3.2 (October 2017), zstd optionally implements very-long-range search and deduplication (--long, 128 MiB window) similar to rzip or lrzip.[6]

Compression speed can vary by a factor of 20 or more between the fastest and slowest levels, while decompression is uniformly fast, varying by less than 20% between the fastest and slowest levels.[7] The Zstandard command-line has an "adaptive" (--adapt) mode that varies compression level depending on I/O conditions, mainly how fast it can write the output.

Zstd at its maximum compression level gives a compression ratio close to lzma, lzham, and ppmx, and performs better[vague] than lza or bzip2.[improper synthesis?][8][9] Zstandard reaches the current Pareto frontier, as it decompresses faster than any other currently available algorithm with similar or better compression ratio.[as of?][10][11]

Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary. It also offers a training mode, able to generate a dictionary from a set of samples.[12][13] In particular, one dictionary can be loaded to process large sets of files with redundancy between files, but not necessarily within each file, such as for log files.

Design

[edit]

Zstandard combines a dictionary-matching stage (LZ77) with a large search window and a fast entropy-coding stage. It uses both Huffman coding (used for entries in the Literals section)[14] and finite-state entropy (FSE) –a fast tabled version of ANS, tANS, used for entries in the Sequences section. Because of the manner in which FSE carries over state between symbols, decompression involves processing symbols within the Sequences section of each block in reverse order (from last to first).

Usage

[edit]
Zstandard
Filename extension
.zst[15]
Internet media type
application/zstd[15]
Magic number28 b5 2f fd[15]
Type of formatData compression
StandardRFC 8878
Websitegithub.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
Zstandard Dictionary
Magic number37 a4 30 ec[15]
StandardRFC 8878
Websitegithub.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#dictionary-format

The Linux kernel has included Zstandard since November 2017 (version 4.14) as a compression method for the btrfs and squashfs filesystems.[16][17][18]

In 2017, Allan Jude integrated Zstandard into the FreeBSD kernel,[19] and it was subsequently integrated as a compressor option for core dumps (both user programs and kernel panics). It was also used to create a proof-of-concept OpenZFS compression method[7] which was integrated in 2020.[20]

The AWS Redshift and RocksDB databases include support for field compression using Zstandard.[21]

In March 2018, Canonical tested[22] the use of zstd as a deb package compression method by default for the Ubuntu Linux distribution. Compared with xz compression of deb packages, zstd at level 19 decompresses significantly faster, but at the cost of 6% larger package files. Support was added to Debian (and subsequently, Ubuntu) in April 2018 (in version 1.6~rc1).[23][22][24]

Fedora added ZStandard support to RPM in May 2018 (Fedora release 28) and used it for packaging the release in October 2019 (Fedora 31).[25] In Fedora 33, the filesystem is compressed by default with zstd.[26][27]

Arch Linux added support for zstd as a package compression method in October 2019 with the release of the pacman 5.2 package manager[28] and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses zstd -c -T0 --ultra -20 -; the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 14 times faster, decompression memory increased by 50 MiB when using multiple threads, and compression memory increased but scales with the number of threads used.[29][30][31] Arch Linux later also switched to zstd as the default compression algorithm for mkinitcpio initial ramdisk generator.[32]

A full implementation of the algorithm with an option to choose the compression level is used in the .NSZ/.XCZ[33] file formats developed by the homebrew community for the Nintendo Switch hybrid game console.[34] It is also one of many supported compression algorithms in the .RVZ Wii and GameCube disc image file format.

On 15 June 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93, deprecating the previous codec number of 20 as it was implemented in version 6.3.7, released on 1 June.[35][36]

In March 2024, Google Chrome version 123 (and Chromium-based browsers such as Brave or Microsoft Edge) added zstd support in the HTTP header Content-Encoding.[37] In May 2024, Firefox release 126.0 added zstd support in the HTTP header Content-Encoding.[38]

License

[edit]

The reference implementation is licensed under the BSD license, published at GitHub.[39] Since version 1.0, published 31 August 2016,[40] it had an additional Grant of Patent Rights.[41]

From version 1.3.1, released 20 August 2017,[42] this patent grant was dropped and the license was changed to a BSD + GPLv2 dual license.[43]

See also

[edit]
  • LZ4 (compression algorithm) – a fast member of the LZ77 family
  • LZFSE – a similar algorithm by Apple used since iOS 9 and OS X 10.11 and made open source on 1 June 2016
  • Zlib
  • Brotli – also integrated into browsers
  • Gzip – one of the most widely used compression tools

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Zstandard, commonly abbreviated as zstd, is a lossless data compression algorithm developed by Yann Collet at (now Meta) and first released as on August 31, 2016, under the BSD license. It is designed to deliver high compression ratios comparable to or better than zlib, while achieving significantly faster compression and decompression speeds suitable for real-time scenarios. Zstd supports configurable compression levels ranging from ultra-fast modes (e.g., level 1 achieving around 2.9:1 ratios at over 500 MB/s) to higher levels for denser compression (up to 20+ for maximum ratios), and it excels in both general-purpose and specialized use cases like streaming . A key innovation of Zstd is its dictionary compression mode, which trains custom dictionaries on sample datasets to improve efficiency for small or repetitive data (e.g., boosting ratios by 30-50% on 1 KB payloads from sources like user profiles). The algorithm employs a combination of finite state , Huffman encoding, and LZ77-style matching, enabling it to outperform alternatives like LZ4 in speed and in ratio for many workloads, as demonstrated in benchmarks on modern hardware such as i7 processors. Decompression speeds routinely exceed 1 GB/s, making it ideal for bandwidth-constrained environments. Since its release, Zstd has seen widespread adoption across industries, including integration into tools like and for archiving, support in databases such as and for storage optimization, and inclusion in operating systems like (via the kernel since version 4.14). It is also standardized for web use through RFC 8878, which defines the 'application/zstd' and content encoding for HTTP transport, enabling efficient delivery of compressed resources. Ongoing development by Meta and the open-source community continues to enhance its capabilities, with the latest stable release v1.5.7 (2024) introducing performance improvements and features like multi-threaded compression and long-distance matching for even larger datasets; recent adoptions include SQL Server 2025 for backups and full support as of 2025.

History and Development

Origins at

Zstandard's development was initiated by Yann Collet, a software engineer at , in early 2015, building on his prior work with fast compression algorithms like LZ4. The project stemmed from 's need to manage exploding data volumes in real-time environments, where traditional tools like and zlib proved inadequate due to their trade-offs between compression speed and ratio. Specifically, Zstandard aimed to enable faster processing for high-throughput tasks such as compressing server logs and performing large-scale backups, which often involve terabytes of data daily across Facebook's infrastructure. Early prototypes integrated innovative techniques, including Finite State Entropy developed by Collet, and were iteratively refined through internal benchmarks using datasets like the corpus. These tests compared Zstandard against established algorithms, revealing it achieved zlib-level compression ratios with 3–5 times faster speeds than , while surpassing LZ4 in ratio without sacrificing much decompression velocity. Facebook's core infrastructure team played a pivotal role, providing resources and expertise to tailor the algorithm for production-scale deployment in , transmission, and pipelines. Key early contributors included as the lead designer, alongside team members who focused on hardware optimization for modern CPUs.

Releases and Milestones

Zstandard's initial public release occurred on January 23, 2015, as version 0.1, developed by Yann to address needs for fast, high-ratio compression in real-time scenarios. Early versions introduced key features like support, enabling improved compression ratios for small or similar datasets by leveraging pre-trained dictionaries. The project advanced through beta iterations before its formal open-sourcing on August 31, 2016, with version 1.0.0, which stabilized the compression format and was hosted on under 's organization. Following open-sourcing, Zstandard benefited from extensive community involvement, with contributions from numerous developers enhancing its portability, performance, and integration capabilities, while maintenance remained under Meta (formerly ). Significant milestones marked the evolution of Zstandard's capabilities. Version 1.3.2, released in October 2017, introduced long-range mode with a 128 MiB search window, allowing better deduplication and compression for large files with repetitive structures. Subsequent releases refined these features, balancing speed and ratio across diverse use cases. As of February 20, 2025, the latest stable release is version 1.5.7, incorporating over 500 commits focused on optimizations in compression, decompression, and multi-threading efficiency.

Features

Performance Characteristics

Zstandard delivers high in both compression and decompression, targeting real-time scenarios with speeds significantly surpassing traditional algorithms like while maintaining comparable or superior compression ratios. Decompression routinely exceeds 500 MB/s on modern hardware, with benchmarks showing up to 1550 MB/s for level 1 compression on a Core i7-9700K processor using the Silesia corpus (as of v1.5.7). This efficiency stems from its design, which prioritizes low-latency decoding suitable for interactive applications. In version 1.5.7 (released February 2025), compression speed at fast levels like level 1 improved notably on small data blocks from the corpus, for example, from 280 MB/s to 310 MB/s for 4 KB blocks (+10%) and from 383 MB/s to 458 MB/s for 32 KB blocks (+20%). These enhancements benefit use cases in data centers and databases, such as with 16 KB blocks. Compression speeds are highly tunable across 22 levels, from fast (level 1) to high-ratio (level 22), enabling trade-offs between throughput and size reduction; negative levels further prioritize speed at the expense of ratio. At level 1, Zstandard achieves a 2.896:1 ratio on the corpus with 510 MB/s compression speed, while the ultra-fast --fast=3 setting yields a 2.241:1 ratio at 635 MB/s (as of v1.5.7). Compared to zlib (the basis for and ZIP), Zstandard compresses 3-5 times faster at equivalent ratios and can produce 10-15% smaller outputs at the same speed, with higher levels offering up to 20-30% better ratios in certain workloads. These characteristics make it versatile for scenarios requiring rapid processing without excessive resource use. Zstandard scales effectively with hardware, leveraging multi-threading to distribute workload across multiple CPU cores for improved throughput on parallel systems, and incorporating SIMD instructions to accelerate block processing on contemporary processors. Starting with v1.5.7, the defaults to multi-threading using up to 4 threads based on system capabilities. This hardware awareness contributes to its decompression consistency, which remains high even under varied conditions. The following table summarizes key performance metrics on the corpus (Core i7-9700K, 24.04, v1.5.7), contrasting Zstandard with , LZ4, and at comparable fast settings:
AlgorithmCompression Speed (MB/s)Decompression Speed (MB/s)
Zstd level 12.8965101550
Zstd --fast=32.2416351980
LZ42.1016753850
Gzip level 12.743105390
Brotli level 02.702400425
In comparison to , Zstandard provides dramatically higher speeds; bzip2 typically compresses at 18-20 MB/s on large text datasets while achieving ratios around 2.5-3:1, but its slower performance limits real-time use cases. Overall, Zstandard's metrics position it as a strong alternative, offering gzip-like ratios with up to 10x faster compression in optimized modes.

Advanced Capabilities

Zstandard provides several advanced features that extend its utility beyond standard compression tasks, enabling efficient handling of specialized scenarios such as small datasets, , and large-scale processing. Dictionary compression allows Zstandard to use pre-trained dictionaries—typically derived from representative samples of the data domain—to enhance compression ratios for small payloads under 1 MB. This mode is particularly effective for repetitive or similar streams, such as repeated payloads or log entries, where it can improve compression ratios by 20-50% compared to dictionary-less compression by leveraging shared patterns without rebuilding the dictionary each time. In v1.5.7, dictionary compression saw further gains of up to 15% in speed and ratio for small blocks. The feature supports simple calls like ZSTD_compress_usingDict for one-off uses or bulk APIs like ZSTD_createCDict for repeated applications, reducing latency after initial dictionary loading. Streaming mode facilitates incremental compression and decompression of unbounded data streams, ideal for large files or real-time applications where full buffering is impractical. It employs structures like ZSTD_CStream for compression and ZSTD_DStream for decompression, processing input in chunks via functions such as ZSTD_compressStream2 (with directives for continuing, flushing, or ending) and ZSTD_decompressStream, which update buffer positions automatically to avoid memory overhead. This enables seamless handling of continuous data flows, such as network transmissions or log processing, without requiring the entire dataset in memory at once. The very-long-range mode, introduced in version 1.3.2, extends the search to 128 MiB to capture distant matches in large inputs, yielding better compression ratios for files exceeding typical block sizes. Activated via ZSTD_c_enableLongDistanceMatching or the --long command-line option, it increases usage but is beneficial for datasets like backups or genomic sequences where long-range redundancies exist, with the window size scaling up to the frame content if needed. Multi-threading support enables parallel processing of compression blocks for levels 1 through 19, distributing workload across multiple CPU cores to accelerate throughput on multi-core systems. Configured with parameters like ZSTD_c_nbWorkers (defaulting to 1, but scalable to available cores) and ZSTD_c_overlapLog for thread coordination, it processes independent blocks concurrently while maintaining sequential output, though it elevates requirements proportionally to thread count. Decompression remains single-threaded due to its inherent sequential nature. Legacy support ensures compatibility with older Zstandard formats dating back to version 0.4.0, allowing decompression of legacy frames when enabled at via ZSTD_LEGACY_SUPPORT. This fallback mechanism detects legacy identifiers and handles them transparently in modern builds, facilitating upgrades in environments with mixed-format archives without data loss. Version 1.5.7 also introduced the --max command-line option for achieving maximum compression ratios beyond level 22, providing finer control for ultra-high compression needs on large datasets like enwik9.

Design

Core Architecture

Zstandard employs a block-based format where compressed files, known as frames, consist of one or more contiguous blocks, each limited to a maximum uncompressed size of 128 KB to facilitate efficient processing and memory management. Each frame begins with a frame header of 2 to 14 bytes, which includes a 4-byte magic number (0xFD2FB528 in little-endian byte order) for identification, a frame header descriptor specifying parameters such as the presence of a dictionary ID, window size descriptor, frame content size, and an optional 4-byte checksum (using XXH64 for integrity verification). The dictionary ID, if present, allows referencing an external dictionary for improved compression on repetitive data, while the window descriptor defines the maximum back-reference distance for matches. Blocks themselves feature a 3-byte header indicating the last block flag, block type (raw, run-length encoded, or compressed), and size, enabling variable compressed sizes while capping uncompressed content at the block maximum. The compression process operates in stages, beginning with block splitting of the input data into segments no larger than 128 KB to balance compression efficiency and resource usage. Within each block, Zstandard applies an LZ77-style dictionary matching to identify and deduplicate repeated sequences, producing literals (unmatched bytes) and sequences (match length, literal length, and offset triples). Matching employs chained s to probe for potential duplicates, with configurable parameters like hashLog determining the size of the initial (powers of 2 from 64 KB to 256 MB) and chainLog setting the length of hash chains for deeper searches (up to 29 bits, or 512 MB). For enhanced deduplication, binary trees can be integrated in advanced strategies (e.g., btopt or btultra), organizing recent data for logarithmic-time lookups and complementing hash chains to reduce redundancy. These mechanisms prioritize speed and ratio through greedy or optimal , avoiding exhaustive searches. Zstandard utilizes an asymmetric sliding search , where the references prior up to a configurable ( , a power of 2 from 1 KB to 3.75 TB), with offsets in sequences pointing backward within this to reconstruct matches during decompression. The default and recommended minimum for is 8 MB, but long-range mode extends this to at least 128 MiB (windowLog=27) for handling datasets with distant repetitions, increasing memory demands while improving ratios on suitable inputs. Chained hash tables accelerate literal and matching by indexing recent bytes, enabling rapid candidate retrieval without full scans. Decompression proceeds block-by-block in a sequential manner due to inter-block dependencies via the sliding window. The core architecture has remained stable since the RFC 8878 standardization in 2021, with ongoing optimizations in subsequent releases up to version 1.5.7 (as of 2025). Each compressed block decodes its literals and sequences independently, using included or predefined tables for literals, followed by offset-based reconstruction of matches. The process verifies the optional frame checksum upon completion to ensure . The overall format adheres to RFC 8878, published in 2021, which standardizes frame parameters for consistent implementation across tools and libraries. This architecture integrates with subsequent stages to produce the final compressed output.

Entropy Coding Mechanisms

Zstandard employs two primary entropy coding mechanisms to achieve efficient compression of literals and sequences: for rapid processing in simpler scenarios and Finite State Entropy (FSE), a table-based implementation of tabled (tANS), for superior compression ratios in more complex cases. is prioritized at lower compression levels for its speed, while FSE is utilized at higher levels to approach the of with reduced computational overhead. These coders operate on probability distributions derived from the input data within each block, enabling adaptive encoding that minimizes redundancy. Literals, which are unmatched bytes in the compression block, are encoded using Huffman trees that can be either static or dynamically constructed based on recent data statistics. The dynamic Huffman trees are built by first counting symbol frequencies in the literals section, then assigning weights inversely proportional to these frequencies (higher weights for more frequent symbols, in range 0-11). Bit lengths are derived as nbBits = 11 - weight (clamped between 0 and 11), with weight 0 indicating unused symbols. Symbols are then sorted by increasing (decreasing weight) and by symbol value for ties to assign canonical prefix codes sequentially. This approach allows to handle literals efficiently, with maximum code lengths capped at 11 bits to balance speed and compression. The resulting tree description is compressed using FSE for transmission to the decoder. Sequences, comprising offsets, match lengths, and literal lengths, are encoded using FSE tables that model their respective s. For each sequence component, probabilities are normalized to a power-of-2 total (defined by Accuracy_Log, typically 5 to 12), and symbols are distributed across table states using a spreading algorithm to ensure even coverage: starting from an initial position, subsequent placements are offset by (tableSize >> 1) + (tableSize >> 3) + 3 and masked to the table size minus one. FSE decoding proceeds via a state machine where the initial state is seeded with Accuracy_Log bits from the ; subsequent states are updated using precomputed tables that incorporate the read bit values with baselines derived from the probability distribution, enabling sequential symbol recovery without multiplications. This normalization ensures precise probability representation, with the state carried over between symbols for cumulative encoding. This design yields compression ratios close to the Shannon limit while operating at speeds comparable to Huffman, with FSE's table-driven nature reducing complexity relative to full arithmetic coders.

Usage

Command-Line Tool

The Zstandard command-line tool, commonly referred to as zstd, provides a standalone utility for compressing and decompressing files using the Zstandard algorithm, with a syntax designed to be familiar to users of tools like gzip and xz. The basic syntax for compression is zstd [options] <source> [-o <destination>], where <source> specifies the input file or files, and -o optionally sets the output path; if omitted, the output defaults to appending .zst to the input filename. Compression levels range from 1 (fastest, lowest ratio) to 22 (slowest, highest ratio), with a default of 3; faster modes use --fast=N (where N is a positive integer, equivalent to negative levels like -1), while higher ratios can be achieved using the --ultra flag to enable levels 20-22 (maximum), though these require significantly more memory and are recommended only for specific use cases. Common options enhance flexibility for various workflows. The -d or --decompress flag decompresses .zst files, restoring the original content, while unzstd serves as a convenient alias for single-file decompression. Dictionary training, useful for improving compression on small or similar datasets, is invoked with --train <files> -o <dictionary>, generating a reusable dictionary file that can then be applied via -D <dictionary>. Multi-threading is supported with -T# (e.g., -T0 for automatic detection or -T4 for four threads), accelerating compression on multi-core systems. For handling large inputs, --long=[windowLog] enables long-distance matching, expanding the search window up to 128 MB or more (e.g., --long=24 for a 16 MB window), at the cost of increased usage. Practical examples illustrate typical usage. To compress a file, run zstd file.txt, which produces file.txt.zst; decompression follows with zstd -d file.txt.zst or simply unzstd file.txt.zst. For streaming data, piping is effective: cat file.txt | zstd -c > file.txt.zst compresses input from standard input and outputs to a file (the -c flag ensures output to stdout for further piping). Compressed files use the .zst extension by convention and include optional integrity checks via xxHash, a fast non-cryptographic , enabled with --checksum to verify post-decompression. The tool operates on single files or streams by default and does not recurse into directories unless specified with -r; this design prioritizes simplicity for basic tasks, while advanced integrations are available through the Zstandard library.

Library and API Integration

The reference implementation of Zstandard is provided as the C library libzstd, typically distributed as a shared object file such as libzstd.so on systems, which exposes a comprehensive for embedding compression and decompression capabilities directly into applications. This library supports both simple block-based operations and advanced streaming modes, enabling efficient handling of data in real-time scenarios without requiring external tools. The is designed for portability, compiling on various platforms including Windows, , and macOS, and is optimized for modern hardware with multi-threading support via the --enable-mt build flag. The core API revolves around simple, high-level functions for non-streaming use cases, such as ZSTD_compress(), which takes input data, a compression level (ranging from 1 for fastest to 22 for maximum ratio), and an output buffer to produce compressed data, returning the compressed size or an error code if the operation fails. Complementing this, ZSTD_decompress() performs the reverse, accepting compressed input and decompressing it into a provided buffer, with bounds checked via ZSTD_decompressBound() to ensure safe allocation. For streaming scenarios, where data arrives incrementally, the library uses stateful contexts: compression is managed through ZSTD_CStream objects created with ZSTD_createCStream() and processed via ZSTD_compressStream(), which allows partial input processing and output flushing; decompression follows a similar pattern with ZSTD_DStream. These contexts maintain internal state across calls, supporting dictionary-based compression for improved ratios on similar data sets. Error handling is standardized with return values inspected via ZSTD_isError() and detailed codes from ZSTD_getErrorCode(), covering issues like insufficient output space or corrupted input. Buffer management is explicit, requiring users to pre-allocate input/output buffers, with helper functions like ZSTD_compressBound() estimating maximum output size to prevent overflows. To facilitate integration beyond C, Zstandard offers official and community-maintained bindings for several popular languages, ensuring consistent performance across ecosystems. The Python binding, python-zstandard, provides a CFFI-based interface mirroring the C API, including classes like ZstdCompressor and ZstdDecompressor for both block and streaming operations, and is recommended for its compatibility with the standard library's compression.zstd module since Python 3.14. For Java, the zstd-jni library wraps libzstd via JNI, offering classes such as ZstdCompressCtx for direct API access and supporting Android environments. In Go, the compress/zstd package from klauspost/compress implements a pure-Go encoder/decoder with streaming support, achieving near-native speeds without CGO dependencies in some modes. Rust bindings are available through the zstd crate, which uses unsafe bindings to libzstd for high performance, including traits for streaming compression and safe buffer handling via std::io. Community wrappers exist for additional languages like .NET and Node.js, but the official ports prioritize these core ones for reliability. Zstandard's library integration shines in scenarios requiring in-memory compression, such as database storage where has incorporated it since version 15 for compressing server-side base backups and since version 16 for pg_dump, reducing storage needs for large datasets while supporting efficient queries. Similarly, it serves as an efficient alternative in network protocols, including implementations where servers like can use Zstandard modules to compress responses, offering better speed-to-ratio trade-offs than for dynamic content transfer. Best practices for usage emphasize proactive to avoid runtime errors or excessive use. Developers should allocate compression contexts using ZSTD_estimateCCtxSize(level) to determine the required heap size based on the desired compression level, ensuring the context fits within application constraints before calling ZSTD_createCCtx(). For partial or streaming frames, always check return values for continuation needs—such as non-zero outputs from ZSTD_compressStream() indicating more data to process—and flush streams explicitly with ZSTD_endStream() to finalize frames. The command-line tool zstd serves as a convenient wrapper around this library for testing and ad-hoc operations.

Licensing and Standards

License Terms

Zstandard is distributed under a dual licensing model, allowing users to choose between the BSD 3-Clause License or the GNU General Public License version 2 (GPLv2) or later, effective from version 1.3.1 released on August 20, 2017. This dual structure provides flexibility for different use cases: the BSD license is permissive, permitting modification, redistribution in source or binary forms, and proprietary use as long as the copyright notice, conditions, and disclaimer are retained, and without implying endorsement by or contributors. In contrast, the GPLv2 imposes requirements, mandating that any distributed modifications or derivative works be released under the same license, including provision of to recipients. Copyright for Zstandard is held by , Inc. and its affiliates, with significant contributions from Yann Collet and other developers. Prior to version 1.3.1, the software was licensed solely under the BSD 3-Clause License. Both licenses impose no royalties or fees for use, distribution, or modification, making Zstandard suitable for both commercial applications and open-source projects.

Standardization Efforts

The Zstandard compression format was formally specified in IETF RFC 8878, published in February 2021, which defines the frame format, including magic bytes, headers, data blocks, and optional checksums, while registering the "application/zstd" for MIME-based transport to ensure across systems. A follow-up informational RFC 9659, published in September 2024, defines window sizing for Zstandard content encoding in HTTP transports. This informational RFC, authored by Yann Collet and edited by Murray S. Kucherawy, supports features like streaming compression, dictionary-based optimization, and with Huffman and finite-state entropy methods, without advancing it to standards track status. Zstandard's adoption in operating systems began with its integration as a module compression option in the version 4.14, released in November 2017, enabling efficient storage and loading of kernel modules. In , Zstandard was incorporated into the kernel in 2018, initially for compressed crash dumps and later extended to tools like newsyslog for log rotation. By 2020, version 2.0 unified support across and BSD platforms, adding Zstandard as a compression with levels 1 through 19, offering ratios comparable to but speeds closer to LZ4. Major software ecosystems have embraced Zstandard for performance gains. version 123, released in March 2024, introduced support for Zstandard via the Content-Encoding header, enabling faster page loads and reduced bandwidth usage as an alternative to and . Similarly, version 126, released in May 2024, added Zstandard compression support in the Accept-Encoding header, enhancing web content delivery efficiency. In cloud and database environments, transitioned S3 storage compression from to Zstandard in 2022, achieving approximately 30% reductions in compressed object sizes at exabyte scale. has supported Zstandard as a message compression since version 2.1.0, balancing throughput and storage in distributed streaming. The permissive BSD or GPLv2 dual license has facilitated broad adoption by allowing seamless integration into proprietary and open-source projects alike. Zstandard's reference implementation on features contributions from over 380 developers and ongoing via Google's OSS-Fuzz, reflecting a vibrant ecosystem. As of 2024, implemented Zstandard compression support, enhancing performance for web delivery including in (QUIC-based) protocols.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.