List of codecs
View on Wikipediafrom Wikipedia
The following is a list of compression formats and related codecs.
Audio compression formats
[edit]Non-compression
[edit]- Linear pulse-code modulation (LPCM, generally only described as PCM) is the format for uncompressed audio in media files and it is also the standard for CD-DA; note that in computers, LPCM is usually stored in container formats such as WAV, AIFF, or AU, or as raw audio format, although not technically necessary.
- Pulse-density modulation (PDM)
- Direct Stream Digital (DSD) is standard for Super Audio CD
- foobar2000 Super Audio CD Decoder (based on MPEG-4 DST reference decoder)
- FFmpeg (based on dsd2pcm)
- Direct Stream Digital (DSD) is standard for Super Audio CD
- Pulse-amplitude modulation (PAM)
Lossless compression
[edit]- Actively used
- Most popular
- Free Lossless Audio Codec (FLAC)[1]
- libFLAC
- FFmpeg
- Apple Lossless Audio Codec (ALAC)
- Apple QuickTime
- libalac
- FFmpeg
- Apple Music[2]
- Monkey's Audio (APE)
- Monkey's Audio SDK
- FFmpeg (decoder only)
- OptimFROG (OFR)
- Tom's verlustfreier Audiokompressor (TAK)
- TAK SDK
- FFmpeg (decoder only)
- WavPack (WV)
- libwavpack
- FFmpeg
- True Audio (TTA)
- libtta
- FFmpeg
- Windows Media Audio Lossless (WMAL)
- Windows Media Encoder
- FFmpeg (decoder only)
- Free Lossless Audio Codec (FLAC)[1]
- Other
- DTS-HD Master Audio, also known as DTS++ and DCA XLL
- libdca (decoder only)
- FFmpeg (decoder only)
- Dolby TrueHD – Standard for DVD-Audio in Blu-ray (mathematically based on MLP)
- FFmpeg
- Meridian Lossless Packing (MLP), also known as Packed PCM (PPCM) – Standard for DVD-Audio in DVD
- FFmpeg
- MPEG-4 Audio Lossless Coding (MPEG-4 ALS)
- SSC, DST, ALS and SLS reference software (ISO/IEC 14496-5:2001/Amd.10:2007)
- FFmpeg (decoding only)
- MPEG-4 Scalable Lossless Coding (MPEG-4 SLS) – Parts of it are used in HD-AAC.
- SSC, DST, ALS and SLS reference software (ISO/IEC 14496-5:2001/Amd.10:2007)
- RealAudio Lossless
- RealPlayer
- FFmpeg (decoding only)
- BFDLAC (BFD Lossless Audio Compression).[3] Ongoing development.
- FXpansion's BFD3 drum software. (2013-2017)
- L2HC - Huawei
- Huawei Music
- NearLink
- Huawei FreeBuds[4]
- DTS-HD Master Audio, also known as DTS++ and DCA XLL
- Oddball
- ATRAC Advanced Lossless (AAL) – Extremely unpopular
- FFmpeg (lossy decoder only)
- Direct Stream Transfer (DST) - Only used for Direct Stream Digital
- SSC, DST, ALS and SLS reference software (ISO/IEC 14496-5:2001/Amd.10:2007)
- FFmpeg (decoder only)
- Original Sound Quality (OSQ) - Only used in WaveLab
- FFmpeg (decoding only)
- ATRAC Advanced Lossless (AAL) – Extremely unpopular
- Most popular
- Discontinued
- Lossless Audio (LA)[5] – No update for 10+ years
- Shorten (SHN)[6] – Officially discontinued.
- libshn
- FFmpeg (decoding only)
- Lossless Predictive Audio Compression (LPAC) – Predecessor of MPEG-4 ALS
- Lossless Transform Audio Compression (LTAC) – Predecessor of LPAC
- MPEG-1 Audio Layer III HD (mp3HD) – Officially discontinued
- RK Audio (RKAU)[7] – Officially discontinued
- FFmpeg (decoding only)
Lossy compression
[edit]- Discrete cosine transform (DCT)
- Modified discrete cosine transform (MDCT, used in most of the audio codecs listed below)
General/Speech hybrid
[edit]- Unified Speech and Audio Coding (USAC, MPEG-D Part 3, ISO/IEC 23003-3)
- exhale (encoder only; open source)
- FFmpeg (decoder only; open source)[8]
- IETF standards:
- Opus (RFC 6716) – based on SILK vocoder and CELT codec
- libopus
- FFmpeg (decoding and experimental encoding)
- Opus (RFC 6716) – based on SILK vocoder and CELT codec
- IETF Internet Draft
- IPMR Speech Codec[9] - used in Spirit DSP's TeamSpirit Voice&Video Engine[10]
Neural audio codecs
[edit]- Lyra (codec) - used in Google Duo
- Lyra V2 - based on SoundStream neural codec
- Satin (used by Microsoft Teams)
- Facebook EnCodec
- WavTokenizer[11]
General
[edit]- Adaptive differential pulse-code modulation (ADPCM, also called adaptive delta pulse-code modulation)
- Adaptive Transform Acoustic Coding (ATRAC, used in MiniDisc devices)
- FFmpeg (decoder only)
- ATSC/ETSI standards:
- Dolby Digital (AC3, ATSC A/52, ETSI TS 102 366)
- FFmpeg
- liba52 (decoder only)
- Dolby Digital Plus (E-AC-3, ATSC A/52:2012 Annex E, ETSI TS 102 366 Annex E)
- FFmpeg
- DTS Coherent Acoustics (DTS, Digital Theatre System Coherent Acoustics, ETSI TS 102 114)
- FFmpeg
- libdca (decoder only)
- Dolby AC-4 (ETSI TS 103 190)
- Dolby Digital (AC3, ATSC A/52, ETSI TS 102 366)
- Impala Blackbird audio codec
- ITU standards:
- G.719
- G.722
- FFmpeg
- G.722.1 (subset of Siren7) and G.722.1 Annex C (subset of Siren14)
- libg722_1
- libsiren (part of libmsn and msn-pecan)
- G.722.2
- 3GPP TS 26.173 – AMR-WB speech Codec (C-source code) – reference implementation[12]
- opencore-amr (decoder)
- VisualOn AMR-WB encoder
- FFmpeg (decoding only)
- EVS
- MPEG-1 Audio and MPEG-2 Audio
- layer I (MP1) (MPEG-1, MPEG-2 and non-ISO MPEG-2.5)
- FFmpeg (decoder only)
- layer II (MP2) (MPEG-1, MPEG-2 and non-ISO MPEG-2.5)
- FFmpeg
- tooLame (encoding only)
- twoLame (encoding only)
- layer III (MP3) (MPEG-1, MPEG-2 and non-ISO MPEG-2.5)
- FFmpeg (decoding only)
- LAME (encoding only)
- Advanced Audio Coding (AAC) (MPEG-2 Part 7)
- FAAC (encoder) and FAAD (decoder)
- FFmpeg
- iTunes
- Nero AAC Codec
- VisualOn AAC Encoder (a.k.a. libvo_aacenc)
- Fraunhofer FDK AAC
- libaacplus
- layer I (MP1) (MPEG-1, MPEG-2 and non-ISO MPEG-2.5)
- MPEG-4 Audio
- Advanced Audio Coding (AAC, MPEG-4 Part 3 subpart 4), HE-AAC and AAC-LD
- FAAC, FAAD2
- FFmpeg
- iTunes
- Nero AAC Codec
- MPEG-4 AAC reference software (ISO/IEC 14496-5:2001)
- Harmonic and Individual Lines and Noise (HILN, MPEG-4 Parametric Audio Coding)
- MPEG-4 reference software (ISO/IEC 14496-5:2001)
- TwinVQ
- MPEG-4 reference software (ISO/IEC 14496-5:2001)
- FFmpeg (decoding only)
- BSAC (Bit-Sliced Arithmetic Coding)
- MPEG-4 reference software (ISO/IEC 14496-5:2001)
- Advanced Audio Coding (AAC, MPEG-4 Part 3 subpart 4), HE-AAC and AAC-LD
- MPEG-H
- Musepack (a.k.a. MPEGplus)
- Musepack SV8 Tools
- FFmpeg (decoding only)
- NICAM
- AT&T Perceptual Audio Coder
- Precision Adaptive Subband Coding (PASC; a variant of MP1; used in Digital Compact Cassette)
- QDesign (purchased by DTS)
- QDesign Music Codec – used in Apple QuickTime
- FFmpeg (decoding only)
- QDesign Music Codec – used in Apple QuickTime
- PictureTel (purchased by Polycom)
- Siren 7
- libg722_1
- libsiren (part of libmsn and msn-pecan)
- FFmpeg (decoder only)
- Siren 14
- libg722_1
- vgmstream (decoder only)
- Siren 22
- Siren 7
- NTT TwinVQ
- FFmpeg (decoder only)
- NTT TwinVQ Encoder, NTT TwinVQ Player
- Voxware MetaSound (a variant of NTT TwinVQ)
- Windows Media Player (voxmsdec.ax)
- FFmpeg (decoder only)
- Vorbis
- Windows Media Audio (WMA)
- Windows Media Encoder
- FFmpeg
- ADC codec (Adaptive Differential Coding)
- Cook Codec (Cooker; RealAudio 6)
- FFmpeg (decoder only)
AES3
[edit]- SMPTE 302M
- FFmpeg (decoder only)
- Dolby E
- FFmpeg (decoder only)
Bluetooth
[edit]- Bluetooth Special Interest Group
- Low Complexity Subband Coding (SBC)
- CVSD 8 kHz - used in Hands-Free Profile (HFP)
- modified SBC (mSBC) - used in Hands-Free Profile (HFP)
- BlueZ's SBC library (libsbc)[13]
- Fluoride Bluetooth stack
- FFmpeg
- SBC XQ
- PulseAudio's bluetooth stack[14] (encoder only)
- PipeWire's bluetooth stack[15] (encoder only)
- LC3 (Low Complexity Communication Codec)
- Google's liblc3 (open source) - used in Android 13 and later
- ETSI
- LC3plus (ETSI TS 103 634)
- Google's liblc3 (open source)
- LC3plus (ETSI TS 103 634)
- Qualcomm Technologies International (formerly CSR)
- aptX (a.k.a. apt-X)
- Qualcomm libaptX[16]
- FFmpeg
- aptX HD
- Qualcomm libaptXHD[16]
- FFmpeg
- aptX Low Latency
- aptX Adaptive
- FastStream - a variant of SBC codec for bi-directional audio transmission
- aptX (a.k.a. apt-X)
- Sony
- LDAC
- libldac (encoder only)[17] - used in Android Oreo
- libldacdec (decoder only)
- LDAC
- HWA Alliance/Savitech
- HiBy
- Ultra Audio Transmission (UAT)
- Samsung
- Samsung HD/UHQ-BT codec
- Samsung Scalable codec
- Samsung Seamless codec
- MQA
- MQair
Digital radio
[edit]- Hybrid Digital Coding - used in HD Radio (a.k.a. NRSC-5)
- NRSC-5 receiver for rtl-sdr (decoder only)
- gr-nrsc5 (encoder only)[20]
Voice
[edit](low bit rate, optimized for speech)
- Linear predictive coding (LPC, used in most of the speech codecs listed below)
- Xiph.Org Foundation
- Dialogic ADPCM (VOX)
- FFmpeg (decoder only)
- ITU standards:
- G.711 (a-law and μ-law companding; 64 kbit/s), also known as PCM of voice frequencies
- Sun Microsystems's public domain implementation[22]
- FFmpeg (libavcodec)
- G.711.0 (G.711 LLC)
- G.711.1 (Wideband extension for G.711; 64/80/96 kbit/s)
- G.711.1D (Super-wideband extension for G.711.1; 96/112/128 kbit/s)
- G.718 (8/12/16/24/32 kbit/s)
- G.718B (Super-wideband extension for G.718; 28–48 kbit/s)
- G.719
- G.721 (superseded by G.726; 32 kbit/s)
- Sun Microsystems's public domain implementation[22]
- G.722 (SB-ADPCM; 48/56/64 kbit/s)
- FFmpeg
- G.722B (Super-wideband extension for G.722; 64/80/96 kbit/s)
- G.722.2 (AMR-WB)
- 3GPP TS 26.173 – AMR-WB speech Codec (C-source code) – reference implementation[12]
- opencore-amr (decoder)
- FFmpeg (decoder only)
- G.723 (24 and 40 kbit/s DPCM, extension to G.721, superseded by G.726)
- Sun Microsystems's public domain implementation[22]
- G.723.1 (MPC-MLQ or ACELP; 5.3/6.3 kbit/s)
- FFmpeg
- G.726 (ADPCM; 16/24/32/40 kbit/s)
- Sun Microsystems's public domain implementation[22]
- FFmpeg (libavcodec)
- G.727
- Sun Microsystems's public domain implementation[22]
- G.728 (LD-CELP; 16 kbit/s)
- FFmpeg (decoder only)
- G.729 (CS-ACELP; 8 kbit/s)
- FFmpeg (decoder only)
- G.729a
- G.729b
- G.729ab
- G.729d (6.4 kbit/s)
- FFmpeg (decoder only)
- G.729e (11.8 kbit/s)
- G.729.1 (G.729 Annex J; Wideband extension for G.711; 8–32 kbit/s)
- G.729.1E (Super-wideband extension for G.729.1)
- G.711 (a-law and μ-law companding; 64 kbit/s), also known as PCM of voice frequencies
- Google
- internet Speech Audio Codec (iSAC)
- WebRTC
- internet Speech Audio Codec (iSAC)
- Nellymoser Asao Codec
- FFmpeg (libavcodec)
- RealNetworks
- RealAudio 1 (VSELP 14.4 kbit/s)
- FFmpeg (decoder only)
- RealAudio 2 (LD-CELP 28.8 kbit/s)
- FFmpeg (decoder only)
- RealAudio 1 (VSELP 14.4 kbit/s)
- PictureTel PT716, PT716plus
- PictureTel PT724
- RTAudio – used by Microsoft Live Communication Server
- SVOPC – used by Skype
- OpenLPC – created by Future Dynamics[22]
- HawkVoice (libHVDI)
- ANSI/SCTE
- ANSI/SCTE 24-21 2006 (BroadVoice16)
- BroadVoice Speech Codec Open Source C Code
- ANSI/SCTE 24-22 2013 (iLBCv2.0)
- ANSI/SCTE 24-23 2007 (BroadVoice32)
- BroadVoice Speech Codec Open Source C Code
- ANSI/SCTE 24-21 2006 (BroadVoice16)
- IETF RFCs:
- Internet Low Bit Rate Codec (iLBC, RFC 3951) – developed by Global IP Solutions/Google
- WebRTC
- Internet Low Bit Rate Codec (iLBC, RFC 3951) – developed by Global IP Solutions/Google
- IETF Internet Draft
- MPEG-4 Audio
- MPEG-4 CELP
- MPEG-4 HVXC
- Skyphone MPLP
- Inmarsat
- INMARSAT-M IMBE
- Inmarsat Mini-M AMBE
- Meta MLow - used in Instagram, Messenger, and WhatsApp.[25]
Microsoft DirectPlay
[edit]Those codecs are used by many PC games which use voice chats via Microsoft DirectPlay API.
- Voxware MetaVoice
- Windows Media Player (voxmvdec.ax)
- Truespeech
- Windows Media Player (tssoft32.acm)
- FFmpeg (decoder only)
- MS GSM
- Windows Media Player (msgsm32.acm)
- libgsm
- FFmpeg (decoder only)
- MS-ADPCM
- Windows Media Player (msadp32.acm)
- FFmpeg
Digital Voice Recorder
[edit]- International Voice Association (IVA) standards:
- Digital Speech Standard / Standard Play (DSS-SP)
- FFmpeg (decoding only)
- Digital Speech Standard / Quality Play (DSS-QP)
- Digital Speech Standard / Standard Play (DSS-SP)
- Sony LPEC
- Truespeech Triple Rate CODER (TRC)[26] – used in some pocket recorders
- Micronas Intermetall MI-SC4 - used by voice recorders such as RadioShack Digital Recorder[27] and I-O DATA HyperHyde[28]
- FFmpeg (decoder only)
- Sanyo LD-ADPCM - used by Sanyo ICR series[29]
- FFmpeg (decoder only)[29]
Mobile phone
[edit]Generation 2
[edit]- European Telecommunications Standards Institute (ETSI) GSM
- Full Rate (GSM 06.10, RPE-LTP)
- libgsm
- FFmpeg (decoder only)
- Half Rate (GSM 06.20, VSELP 5.6 kbit/s)
- Enhanced Full Rate (GSM 06.60, ACELP 12.20 kbit/s, compatible with AMR mode AMR_12.20)
- Full Rate (GSM 06.10, RPE-LTP)
- Telecommunications Industry Association (TIA) IS-95 (a.k.a. cdmaOne)
- IS-96A (QCELP 8 kbit/s)
- IS-127 (EVRC 8 kbit/s)
- IS-733 (QCELP 13 kbit/s)
- Telecommunications Industry Association (TIA) IS-54/IS-136 (a.k.a. Digital AMPS)
- IS-85 (VSELP 8kbit/s)
- ITU-T G.191's IS-54 implementation
- IS-641 (ACELP 7.4 kbit/s, compatible with AMR mode AMR_7.40)
- IS-85 (VSELP 8kbit/s)
- Association of Radio Industries and Businesses (ARIB) RCR STD-27 (PDC)
Generation 3/4
[edit]- 3rd Generation Partnership Project (3GPP)
- Adaptive Multi-Rate (AMR)
- AMR-NB
- 3GPP TS 26.073 – AMR speech Codec (C-source code) – reference implementation[30]
- opencore-amr (one may compile ffmpeg with—enable-libopencore-amrnb to incorporate the OpenCORE lib)
- FFmpeg (by default decoder only, but see above the compiling options to incorporate the OpenCORE lib)
- AMR-WB
- 3GPP TS 26.173 – AMR-WB speech Codec (C-source code) – reference implementation[12]
- opencore-amr (decoder), from OpenCORE (one may compile ffmpeg with—enable-libopencore-amrwb to incorporate the OpenCORE lib)
- vo-amrwbenc (encoder), from VisualOn, included in Android (one may compile ffmpeg with—enable-libvo-amrwbenc to incorporate the VisualOn lib)
- FFmpeg (by default decoder only, but see above the compiling options).
- AMR-WB+
- 3GPP TS 26.273 – AMR-WB+ speech Codec (C-source code) – reference implementation[31]
- Enhanced Voice Services (EVS)
- 3GPP TS.26.443 – Codec for Enhanced Voice Services (EVS) – ANSI C code (floating-point)[32]
- AMR-NB
- Adaptive Multi-Rate (AMR)
- 3rd Generation Partnership Project 2 (3GPP2)
- Enhanced Variable Rate Codec (EVRC, a.k.a. IS-127) – based on RCELP
- FFmpeg (decoder only)
- Enhanced Variable Rate Codec B (EVRC-B)
- QCELP (Qualcomm Code Excited Linear Prediction)
- QCELP-8 (a.k.a. SmartRate or IS-96C)
- FFmpeg (decoder only)
- QCELP-13 (a.k.a. PureVoice or IS-733)
- FFmpeg (decoder only)
- QCELP-8 (a.k.a. SmartRate or IS-96C)
- Selectable Mode Vocoder (SMV)
- Variable Multi Rate – WideBand (VMR-WB)
- Enhanced Variable Rate Codec (EVRC, a.k.a. IS-127) – based on RCELP
Professional mobile radio
[edit]- APCO
- Project 25 Phase 2 Enhanced Full-Rate (AMBE+2 4400bit/s with 2800bit/s FEC)
- Project 25 Phase 2 Half-Rate (AMBE+2 2450bit/s with 1150bit/s FEC) – also used in NXDN and DMR
- mbelib (decoder only)
- Project 25 Phase 1 Full Rate (IMBE 7200bit/s)
- mbelib (decoder only)
- European Telecommunications Standards Institute (ETSI)
- ETS 300 395-2 (TETRA ACELP 4.6kbit/s)
- TETRAPOL
- RPCELP 6 kbit/s
- D-STAR Digital Voice (AMBE 2400bit/s with 1200bit/s FEC)
- mbelib (decoder only)
- Professional Digital Trunking System Industry Association (PDT Alliance) standards:
- NVOC – used in China
- Spirit DSP RALCWI
- DSPINI
- SPR Robust
- TWELP Robust
- Codec2
- libcodec2
- RL-CELP (used in Japanese railways[33][34])
Military
[edit]- U.S. Department of Defense (DoD) Federal Standard:
- United States Military Standard (MIL-STD)
- MIL-STD-188 113 (CVSD 16 kbit/s and 32 kbit/s)
- SoX (libsox)
- MIL-STD-3005 (a.k.a. MELP)
- Texas Instruments' 2.4 kbit/s MELP Proposed Federal Standard speech coder
- MIL-STD-188 113 (CVSD 16 kbit/s and 32 kbit/s)
- NATO
- STANAG 4198 (a.k.a. LPC-10e)
- SpanDSP (open source)
- STANAG-4591 (a.k.a. MELPe)
- Microsoft Speech coder
- STANAG 4198 (a.k.a. LPC-10e)
- BBN NRV – developed in DARPA program[35]
Video games
[edit]- Bink Audio, Smacker Audio
- FFmpeg (decoder only)
- Actimagine (Nintendo European Research & Development) FastAudio[36]
- MobiclipDecoder (decoder only)
- FFmpeg (decoder only)
- Nintendo GCADPCM[37] (a.k.a. DSP ADPCM or THP ADPCM) - used in GameCube, Wii and Nintendo 3DS.
- vgmstream (decoder only)
- VGAudio
- FFmpeg (decoder only)
- Sony VAG[37] (a.k.a. Sony PSX ADPCM)
- vgmstream (decoder only)
- FFmpeg (decoder only)
- Sony HEVAG[37] - used in PS Vita.[38]
- vgmstream (decoder only)
- Sony ATRAC9[37] - used in PS4 and PS Vita.
- VGAudio (decoder only)
- FFmpeg (decoder only)
- Microsoft XMA[37] - WMA variants for Xbox 360 hardware decoding.[39]
- FFmpeg (decoder only)
- Xbox ADPCM
- vgmstream (decoder only)
- FFmpeg (decoder only)
- CRI ADX ADPCM
- vgmstream (decoder only)
- VGAudio
- FFmpeg
- CRI AHX
- CRI HCA/HCA-MX - used in CRI ADX2 middleware.[40]
- vgmstream (decoder only)
- VGAudio
- FFmpeg (decoder only)
- libcgss
- HCADecoder (decoder only)
- FMOD FADPCM[41]
- vgmstream (decoder only)
Text compression formats
[edit]- BiM
- Continuous Media Markup Language (CMML)
- MPEG-4 Part 17 (e.g. 3GPP Timed Text)
- ttyrec
Video compression formats
[edit]- RGB 4:4:4 (only linear, transfer-converted and bit-reduced also sort of compression up to about 3:1 for HDR)
- YUV 4:4:4/4:2:2/4:1:1/4:2:0 (all lower 4:4:4 is spatially compressed up to 2:1 for 4:2:0 with specific colour distortions).
- Intel IYUV
- 10-bit uncompressed video
- Composite digital signal - used by SMPTE D-2 and D-3 broadcast digital videocassettes
- Avid DNxUncompressed (SMPTE RDD 50)
- V210 - defined by Apple and used by Serial digital interface Input/output video cards[42]
Analog signals
[edit]- PAL broadcast signal
- Pyctools-PAL (open source)
- NTSC broadcast signal
- gr-ntsc (open source)
- LaserDisc RF signal
- ld-decode (open source)
- VHS / S-VHS / U-Matic RF signal
- VHS-Decode (open source)
- Composite Video Baseband Signal (CVBS)
- VHS-Decode's CVBS-Decode (open source)
- ITU-T/ISO/IEC standards:
- H.264 lossless
- H.265 lossless[43]
- Motion JPEG 2000 lossless
- libopenjpeg
- JPEG XS lossless
- FastTICO-XS
- IETF standards:
- FFV1 (RFC 9043)[47] – FFV1's compression factor is comparable to Motion JPEG 2000, but based on quicker algorithms (allows real-time capture). Written by Michael Niedermayer and published as part of FFmpeg under GNU LGPL.
- FFmpeg
- FFV1 (RFC 9043)[47] – FFV1's compression factor is comparable to Motion JPEG 2000, but based on quicker algorithms (allows real-time capture). Written by Michael Niedermayer and published as part of FFmpeg under GNU LGPL.
- SMPTE standards:
- Alparysoft Lossless Video Codec (Alpary)
- Apple Animation (QuickTime RLE)
- QuickTime
- FFmpeg
- ArithYuv
- AV1
- AVIzlib
- LCL (VfW codec) MSZH and ZLIB[48]
- FFmpeg
- Autodesk Animator Codec (AASC)
- FFmpeg (decoder only)
- CAI Format
- CamStudio GZIP/LZO
- FFmpeg (decoder only)
- Chennai Codec (EVX-1)
- Cairo Experimental Video Codec (open source)
- Dxtory
- FFmpeg (decoder only)
- FastCodec
- Flash Screen Video v1/v2[49]
- FFmpeg
- FM Screen Capture Codec
- FFmpeg (decoder only)
- Fraps codec (FPS1)[50]
- FFmpeg (decoder only)
- Grass Valley Lossless
- Grass Valley Codec Option
- FFmpeg (decoder only)
- Huffyuv Huffyuv (or HuffYUV) was written by Ben Rudiak-Gould and published under the terms of the GNU GPL as free software, meant to replace uncompressed YCbCr as a video capture format. It uses very little CPU but takes a lot of disk space. See also ffvhuff which is an "FFmpeg only" version of it.
- FFmpeg
- IgCodec
- Intel RLE
- innoHeim/Rsupport Screen Capture Codec
- FFmpeg (decoder only)
- Lagarith A more up-to-date fork of Huffyuv is available as Lagarith[51]
- Lagarith Codec (VfW codec)
- FFmpeg (decoder only)
- LOCO[52] - based on JPEG-LS
- FFmpeg (decoder only)
- MagicYUV[53]
- MagicYUV SDK
- FFmpeg
- Microsoft RLE (MSRLE)
- FFmpeg
- MSU Lossless Video Codec
- MSU Screen Capture Lossless
- CorePNG - based on PNG
- FFmpeg
- ScreenPresso (SPV1)
- FFmpeg (decoder only)
- ScreenPressor[54] - a successor of MSU Screen Capture Lossless
- FFmpeg (decoder only)
- SheerVideo
- FFmpeg (decoder only)
- Snow lossless
- FFmpeg
- TechSmith Screen Capture Codec (TSCC)[55]
- EnSharpen Video Codec for QuickTime
- FFmpeg (decoder only)
- Toponoky
- Ut Video Codec Suite[56][57]
- libutvideo
- FFmpeg
- VBLE[58]
- FFmpeg (decoder only)
- VP9 by Google[59]
- libvpx
- FFmpeg (decoder only)
- YULS
- ZeroCodec
- FFmpeg (decoder only)
- ZMBV (Zip Motion Block Video) Codec - used by DOSBox
- FFmpeg
Lossless game codecs
[edit]- DXA
- ScummVM Tools (encoder only)
- FFmpeg (decoder only)
Lossy compression
[edit]- Discrete cosine transform (DCT, used in Digital Betacam[60] and most of the video codecs listed below)
General
[edit]- ITU-T/ISO/IEC standards:
- H.120
- H.261 (a.k.a. Px64)
- FFmpeg H.261 (libavcodec)
- Microsoft H.263
- MPEG-1 Part 2 (MPEG-1 Video)
- FFmpeg
- MainConcept MPEG-1
- TMPGEnc
- H.262/MPEG-2 Part 2 (MPEG-2 Video)
- Canopus ProCoder
- Cinema Craft Encoder
- FFmpeg
- InterVideo Video Decoder
- MainConcept MPEG-2
- Microsoft H.263
- TMPGEnc
- NVDEC (for NVIDIA GPU)
- H.263
- FFmpeg H.263 (libavcodec)
- MPEG-4 Part 2 (MPEG-4 Advanced Simple Profile)
- H.264/MPEG-4 AVC or MPEG-4 Part 10 (MPEG-4 Advanced Video Coding), approved for Blu-ray
- CoreAVC (decoder only; limited to below Hi10P profile)
- MainConcept
- Nero Digital
- QuickTime H.264
- Sorenson AVC Pro codec, Sorenson's new implementation
- OpenH264 (baseline profile only)
- x264 (encoder only; supports some of Hi422P and Hi444PP features)
- FFmpeg (decoder only)
- NVDEC/NVENC(for NVIDIA GPU)
- MPEG-4 AVC variants:
- MPEG-4 Web Video Coding or MPEG-4 Part 29 – a subset of MPEG-4 AVC baseline profile
- XAVC
- HEVC (High Efficiency Video Coding, H.265, MPEG-H part 2)
- x265 (encoder only)
- NVDEC/NVENC (for NVIDIA GPU)
- Versatile Video Coding (H.266, VVC)
- VVC Test Model (VTM reference software for VVC; open source)
- Fraunhofer Versatile Video Decoder (open source; decoder only)
- Fraunhofer Versatile Video Encoder (open source; encoder only)
- FFmpeg (decoder only)
- Video Coding for Browsers (VCB)/VP8 (MPEG-4 Part 31, ISO/IEC 14496-31, RFC 6386)
- libvpx
- FFmpeg
- NVDEC (for NVIDIA GPU)
- Internet Video Coding (ISO/IEC 14496-33, MPEG-4 IVC)
- Essential Video Coding (EVC; MPEG-5 Part 1; under-development)
- eXtra-fast Essential Video Encoder (open source; encoder only)
- eXtra-fast Essential Video Decoder (open source; decoder only)
- IETF Internet Draft (NETVC)
- SMPTE standards:
- VC-1 (SMPTE 421M, subset of Windows Media Video)
- FFmpeg (decoder only)
- NVDEC (for NVIDIA GPU)
- Dirac (SMPTE 2042-1)
- Schrödinger
- dirac-research
- FFmpeg (decoder only)
- VC-1 (SMPTE 421M, subset of Windows Media Video)
- Alliance for Open Media
- Xiph.Org Foundation
- Apple Video (Apple RPZA)
- Blackbird FORscene video codec
- Firebird[63] Original FORscene video codec
- Digital Video Interactive standards:
- RTV 2.1 (a.k.a. Indeo 2)
- FFmpeg (decoder only)
- PLV (Production Level Video)
- ActionMedia II driver (decoder only)
- RTV 2.1 (a.k.a. Indeo 2)
- Indeo 3[64]/4/5[65]
- FFmpeg (decoder only)
- Microsoft Video 1 (MSV1, MS-CRAM, based on MotiVE)
- FFmpeg (decoder only)
- Open Media Commons standards:
- On2 Technologies TrueMotion VP3/VP4, VP5, VP6, VP7; under the name The Duck Corporation: TrueMotion S, TrueMotion 2, TrueMotion RT 2.0
- FFmpeg (decoder only)
- RealVideo 1, G2, 8, 9 and 10
- FFmpeg
- RealMedia HD SDK
- RealVideo Fractal Codec (a.k.a. Iterated Systems ClearVideo)
- FFmpeg (decoder only)
- RealMedia HD (a.k.a. RealVideo 11 or RV60)
- RealMedia HD SDK
- FFmpeg (decoder only)
- Snow Wavelet Codec
- Sorenson Video,[66] Sorenson Spark
- FFmpeg
- VP9 by Google; VP10 was not released and instead was integrated into AV1
- libvpx
- FFmpeg
- NVDEC (for NVIDIA GPU)
- Windows Media Video (WMV)
- WAX (Part of the Windows Media Series)
- FFmpeg
- Guobiao standards (GB/T)
- Audio Video Standard (AVS)
- AVS1-P2 (GB/T 20090.2-2006) - used in China Blue High-definition Disc.
- FFmpeg (decoding only)
- AVS1-P7 (AVS-M; under-development)
- AVS2-P2 (GB/T 33475.2-2016, IEEE 1857.4 (draft))
- uAVS2 Encoder
- xavs2 (encoder only)
- davs2 (libdavs2; decoder only)
- AVS3-P2 (draft, IEEE1857.10)
- uavs3e (encoder only)
- uavs3d (decoder only)
- AVS1-P2 (GB/T 20090.2-2006) - used in China Blue High-definition Disc.
- Audio Video Standard (AVS)
AI-based / AI-enhanced video codecs
[edit]- AIVC[67]
- Deep Render codec[68][69]
- MPAI
- AI-Enhanced Video Coding (MPAI-EVC; under development)
- AI-based End-to-End Video Coding (MPAI-EEV; under development)
Scalable / Layered
[edit]VP8,[70] VP9,[70] AV1,[70] and H.266/VVC support scalable modes by default.
- ITU-T/ISO/IEC standards:
- Scalable Video Coding (H.264/SVC; H.264/MPEG-4 AVC Annex G; an extension of H.264/MPEG-4 AVC)
- Scalable High Efficiency Video Coding (SHVC; an extension of H.265/HEVC)
- Low Complexity Enhancement Video Coding (LCEVC; MPEG-5 Part 2)
- LCEVC Decoder SDK (open source; decoder only)
- V-Nova LCEVC SDK
- SMPTE standards
- VC-4 Layered Video Extension (SMPTE ST 2058-1:2011)
Intra-frame-only
[edit]- Motion JPEG
- ISO/IEC standard
- Motion JPEG 2000 (ISO/IEC 15444-3, ITU-T T.802)
- JPEG XS (ISO/IEC 21122) Lightweight Low latency video codec
- intoPIX fastTICO-XS[74]
- DV (IEC 61834)
- FFmpeg
- MPEG-4 SStP (ISO/IEC 14496-2)
- FFmpeg[75]
- Motion JPEG XR (ISO/IEC 29199-3, ITU-T T.833)
- Animated JPEG XL (ISO/IEC 18181)
- libjxl[76]
- IETF Internet Draft
- Advanced Professional Video (AVP)[77]
- OpenAPV (open source)
- FFmpeg (decoder only)
- Advanced Professional Video (AVP)[77]
- Apple ProRes 422/4444
- FFmpeg
- Apple Intermediate Codec
- FFmpeg (decoder only)
- Apple Pixlet
- FFmpeg (decoder only)
- AVC-Intra
- x264 (encoder only)
- FFmpeg (decoder only)
- AVC-Ultra – a subset of MPEG-4 AVC Hi444PP profile
- XAVC-I
- CineForm HD
- CineForm-SDK – developed by GoPro (open source)
- FFmpeg
- SMPTE standard
- VC-2 SMPTE standard (a.k.a. Dirac Pro. SMPTE ST 2042)
- Schrödinger
- dirac-research
- VC-2 Reference Encoder and Decoder – developed by BBC (open source)
- FFmpeg (the encoder only supports VC-2 HQ profile)
- VC-3 SMPTE standard (SMPTE ST 2019)
- VC-5 SMPTE standard (SMPTE ST 2073; a superset of CineForm HD)
- VC-6 SMPTE standard (SMPTE ST 2117-1)
- V-Nova VC-6 SDK
- VC-2 SMPTE standard (a.k.a. Dirac Pro. SMPTE ST 2042)
- Grass Valley HQ/HQA/HQX
- Grass Valley Codec Option
- FFmpeg (decoder only)
- NewTek NT25
- NewTek SpeedHQ - used in Network Device Interface (NDI) protocol
- NewTek Codec[78]
- FFmpeg
Stereoscopic 3D / Multiview
[edit]- Multiview Video Coding
- Multiview High Efficiency Video Coding (MV-HEVC; an extension of H.265/HEVC)
- MainConcept MV-HEVC Encoder add-on
- FFmpeg (decoder only)
- x265 v4.0 or later (encoder only)
- NVENC[79] (for NVIDIA GPU)
Security and surveillance cameras
[edit]- Guobiao standards (GB/T)
- AVS-S-P2 (suspended[80])
- SVAC (GB/T 25724-2010)
- Infinity CCTV Codec (IMM4/IMM5/IMM6)
CD-ROM or CD-related video codecs
[edit]- CDXL codec
- FFmpeg (decoder only)
- Cinepak[83] (a.k.a. Apple Compact Video)
- FFmpeg
- Photo CD codec
- FFmpeg (decoder only)
- MotionPixels - used in MovieCD
- FFmpeg (decoder only)
- CD+G (CD+Graphics) codec
- FFmpeg (decoder only)
- VLC (decoder only)
- CD+EG (CD+Extended Graphics) codec
Network video codecs
[edit]- SMPTE RDD
- LLVC (Low Latency Video Codec; SMPTE RDD 34) - used in Networked Media Interface (NMI; SMPTE RDD 40)
- HEVC-SCC (Screen Content Coding Extensions)
- x265 v4.0 or later (encoder only)
- FFmpeg (decoder only)
- ZRLE (RFC 6143 7.7.6) - used by VNC
- Sun Microsystems's CellB video (RTP playload type 25) - used in Solaris's SunVideo Plus[84] and Lawrence Berkeley National Laboratory's vic (Video Conferencing Tool)[85]
- Xerox PARC's Network Video (nv; RTP playload type 28) - used in Xerox's nv and Lawrence Berkeley National Laboratory's vic (Video Conferencing Tool)
- CU-SeeMe video codec
- GoToMeeting codec
- FFmpeg (decoder only)
- Microsoft
Screen capture video codecs
[edit]- Microsoft Camcorder Video (based on the GDI interface) - used in Microsoft Office 97's Microsoft Camcorder
- VMnc VMware screen codec[89] (based on the RFB protocol of VNC[90]) - used by VMware Workstation
- vmnc.dll[90]
- FFmpeg (decoder only)
Bayer/Compressed RAW video codecs
[edit]- CinemaDNG (created by Adobe; used in Blackmagic cameras)
- Redcode RAW (used in RED cameras) – a modified version of JPEG 2000[91]
- libredcode
- ArriRaw (used in Arri cameras)
- Cineform RAW (used in Silicon Imaging cameras)
- CineForm-SDK
- Blackmagic RAW (used in Blackmagic cameras)
- Blackmagic RAW SDK
- Cintel RAW (used in Cintel Scanner[92])
- FFmpeg (decoder only)
- Apple ProRes RAW
- FFmpeg (decoder only)[93]
- intoPIX TICO RAW[94]
- intoPIX fastTICO-RAW SDK & TICO-RAW FPGA/ASIC libraries[95]
- Canon CRX - used in Canon Cinema Raw Light movie
- Canon RAW Plugin for Avid Media Access
- LibRaw (decoder only; open source)
- Sony X-OCN
Video games
[edit]- Bink Video, Smacker video
- FFmpeg
- libavcodec
- Nintendo Mobiclip video codec
- FFmpeg (decoder only)
- CRI Sofdec codec - a MPEG variant with 11-bit DC and color space correction;[96] used in Sofdec middleware
- CRI P256 - used in Sofdec middleware for Nintendo DS[97]
- Indeo Video Interactive (aka Indeo 4/5) - used in PC games for Microsoft Windows
- FFmpeg (decoder only)
- Intel Indeo Video
Real-time
[edit]- RivaTuner video codec (RTV1/RTV2)
- FFmpeg (RTV1 decoder only)
- Hap/Hap Alpha/Hap Q
- VIDVOX hap codec
- FFmpeg
- DXV Codec
- Resolume DXV Codec
- FFmpeg
- NotchLC
- FFmpeg (decoder only)
- VESA Display Stream Compression (DSC)
- VESA Display Compression-M (VDC-M)
See also
[edit]References
[edit]- ^ FLAC (Free Lossless Audio Codec), Version 1.1.2 Library of Congress
- ^ "About lossless audio in Apple Music". 25 October 2021.
- ^ "BFDLAC: A Fast lossless Audio Compression Algorithm For Drum Sounds" (PDF). Archived from the original (PDF) on 2017-01-18. Retrieved 2017-01-17.
- ^ Matsui, Emiko (2023-09-19). "Huawei L2HC 3.0 delivers 1.5Mbps lossless sound quality, 4X faster than Apple's AAC". Huawei Central. Retrieved 2024-04-29.
- ^ "Lossless Audio Homepage". www.lossless-audio.com.
- ^ Shorten Lossless Audio Compression Format (SHN), Version 3.5.1 Library of Congress
- ^ "RK Audio - Hydrogenaudio Knowledgebase". wiki.hydrogenaud.io.
- ^ FFmpeg 7.1 Released With VVC Decoder Promoted To Stable, Vulkan H.264/H.265 Encode. Phoronix. 30 September 2024.
- ^ IPMR Speech Codec - draft-spiritdsp-ipmr-01.txt IETF
- ^ TeamSpirit Voice&Video Engine PC. Spirit DSP
- ^ WavTokenizer: A Breakthrough Acoustic Codec Model Redefining Audio Compression. Marktechpost Media. September 3, 2024
- ^ a b c 3GPP (2008-12-11) 3GPP TS 26.173 - AMR-WB speech Codec; version 8.0.0 Release 8, retrieved 2009-09-09
- ^ Release of sbc-1.1, BlueZ Project, April 30, 2013
- ^ PulseAudio 15 Released With Bluetooth Improvements, Better Hardware Support. Phoronix. July 28, 2021
- ^ PipeWire: Bluetooth support status update. Collabora. April 29, 2022
- ^ a b Integration of the aptX and aptX-HD codecs for A2DP source, Android Open Source Project, January 4, 2017
- ^ The contribution of LDAC encoder, Android Open Source Project, January 10, 2017
- ^ "What is LHDC". hwa-audio. Retrieved 2019-04-30.
- ^ "What is LLAC™?". LHDC org.[dead link]
- ^ "hdc-encoder".
- ^ Speex Audio Codec, Version 1.2 Library of Congress
- ^ a b c d e f Finding voice codecs for free software. Linux.com. October 14, 2005
- ^ SILK Speech Codec - draft-vos-silk-02 IETF
- ^ Constrained-Energy Lapped Transform (CELT) Codec - draft-valin-celt-codec-02 IETF
- ^ WhatsApp adds new features to the calling experience, including support for 32-person video calls. TechCrunch. June 13, 2024
- ^ "DSP Group Unveils Total Telephony Solutions(TM) For Digital Cordless Telephony Applications". Archived from the original on August 23, 2016. Retrieved June 24, 2015.
- ^ RadioShack Digital Recorder OWNER'S MANUAL p.38. RadioShack. 2002.
- ^ HyperHyde Operation Manual. p.40. I-O DATA. 2000.
- ^ a b Whisper it: FFmpeg 8 can now subtitle your videos on the fly. The Register. August 28, 2025
- ^ 3GPP (2008-12-11) 3GPP TS 26.073 - AMR speech Codec; version 8.0.0 Release 8, retrieved 2009-09-08.
- ^ 3GPP (2008-12-18) 3GPP TS 26.273 - AMR-WB+ speech Codec; version 8.0.0 Release 8, retrieved 2009-09-09
- ^ 3GPP TS 26.443. Codec for Enhanced Voice Services (EVS); ANSI C code (floating-point).
- ^ INFORMATION COLLECTION SURVEY FOR THE MEGA MANILA SUBWAY PROJECT IN THE REPUBLIC OF THE PHILIPPINES Japan International Cooperation Agency September, 2015
- ^ 東北上越新幹線デジタル列車 無線システムの開発 (in Japanese) East Japan Railway Company 2003
- ^ Obranovich, Charles R.; Golusky, John M.; Preuss, Robert D.; Fabbri, Darren R.; Cruthirds, Daniel R.; Aylward, Erin M.; Freebersyser, James A.; Kolek, Stephen R. (2010). "300 BPS noise robust vocoder". 2010 - Milcom 2010 Military Communications Conference. pp. 298–303. doi:10.1109/MILCOM.2010.5680311. ISBN 978-1-4244-8178-1. S2CID 8991597.
- ^ Actimagine allège le multimédia sur les terminaux portables (in French), IT Industrie & Technologies, June 25, 2004
- ^ a b c d e AudioCompressionFormat, Unity Technologies
- ^ Audio Clip, Unity Technologies
- ^ Differences Between Windows and Xbox 360, Microsoft
- ^ 【ひらブラ vol.37】音数を諦めず/音質を妥協せず/負荷を極小にする方法(iOS&Android) (in Japanese), Kadokawa Dwango, September 26, 2014
- ^ FMOD Studio 1.06 and FMOD at GDC expo program announced, Gamasutra, February 17, 2015
- ^ Faster professional 10-bit video conversions. Open Broadcast Systems
- ^ "Lossless". x265.readthedocs.io.
- ^ "HEVC Decoding". x265.
- ^ "FFmpeg Now Supports HEVC/H.265 Decoding". phoronix.
- ^ "Encode/H.265". FFmpeg.
- ^ Niedermayer, Michael; Rice, Dave; Martinez, Jérôme (August 2021). "rfc9043 - FFV1 Video Coding Format Version 0, 1, and 3". datatracker.ietf.org.
- ^ "Lossless Codec Libraries". multimedia.cx.
- ^ "FFmpeg: libavcodec/flashsv.c File Reference". ffmpeg.org.
- ^ "FRAPS show fps, record video game movies, screen capture software". www.fraps.com.
- ^ "Lagarith Lossless Video Codec". lags.leetcode.net.
- ^ "LOCO - MultimediaWiki". wiki.multimedia.cx.
- ^ "MagicYUV – Lossless video codec".
- ^ "ScreenPressor by Infognition - lossless video codec for screen capture". infognition.com.
- ^ "Downloads". TechSmith. Archived from the original on 2011-10-22. Retrieved 2011-07-14.
- ^ "#534 (Ut Video Support) – FFmpeg". ffmpeg.org.
- ^ "Ut Video Codec Suite - a new lossless video codec for Windows! [Archive] - Doom9's Forum". doom9.org.
- ^ "VBLE - MultimediaWiki". wiki.multimedia.cx.
- ^ "The WebM Project - VP8 Encode Parameter Guide". webmproject.org.
- ^ Medoff, Norman; Fink, Edward J. (September 10, 2012). Portable Video: ENG & EFP. CRC Press. p. 221. ISBN 9781136047701.
- ^ Samuelsson, J. and P. Hermansson (July 2, 2018). "The xvc video codec". datatracker.ietf.org.
- ^ Fuldseth, Arild; Bjontegaard, Gisle; Midtskogen, Steinar; Davies, Thomas; Zanaty, Mo (October 31, 2016). "Thor Video Codec". tools.ietf.org.
- ^ "Live demonstration". Forbidden.
- ^ Indeo Video Codec, Version 3 Library of Congress
- ^ Indeo Video Codec, Version 5 Library of Congress
- ^ Sorenson Video Codec, Version 3 Library of Congress
- ^ What Is AI Video Compression?. MASV. January 5, 2023
- ^ Streamers look to AI to crack the codec code. International Broadcasting Convention. 25 June 2024
- ^ Intel Ignite Selects Startups for Spring ’23 Cohorts. Intel
- ^ a b c Scalable Video Coding (SVC) Extension for WebRTC - 4. Operational model, World Wide Web Consortium, September 26, 2020
- ^ "M-JPEG Codec". Montpellier, France: Morgan Multimedia. Archived from the original on April 17, 2018. Retrieved April 28, 2018.
- ^ "M-JPEG2000 Codec". Montpellier, France: Morgan Multimedia. Archived from the original on April 29, 2018. Retrieved April 28, 2018.
- ^ "dcpPlayer". Montpellier, France: Morgan Multimedia. Retrieved April 28, 2018.
- ^ "FastTICO-XS Codec". Mont-Saint-Guibert, Belgium: intoPIX.
- ^ FFmpeg-cvslog - mpeg4video: Add support for MPEG-4 Simple Studio Profile., FFmpeg Project, April 2, 2018
- ^ FFmpeg Adds Support For Animated JPEG-XL, Phoronix, June 8, 2023
- ^ "Advance Professional Video". datatracker.ietf.org. March 1, 2024.
- ^ NewTek Codec Notes NewTek
- ^ NVENC Video Encoder API Programming Guide - MultiView Video Coding in HEVC (MV-HEVC) NVIDIA
- ^ Achievement, Audio Video Coding Standard Workgroup of China
- ^ FFmpeg-cvslog - avcodec: add IMM4 decoder, FFmpeg Project, August 21, 2018
- ^ FFmpeg-cvslog - avcodec: add IMM5 decoder, FFmpeg Project, August 29, 2019
- ^ Cinepak Library of Congress
- ^ SunVideo Plus for PCI User's Guide - Audio Video Conferencing. Oracle
- ^ vic: Change History. Lawrence Berkeley National Laboratory
- ^ Remote Desktop Protocol: RemoteFX Codec Extension. Microsoft
- ^ a b Survey of Virtual Desktop Infrastructure System draft-ma-appsawg-vdi-survey-00. IETF. May 13, 2011
- ^ Remote Desktop Protocol: NSCodec Extension. Microsoft
- ^ "VMware Video". multimedia.cx.
- ^ a b VMware VMnc AVI video codec image height heap overflow. Carnegie Mellon University
- ^ libredcode
- ^ Cintel Scanner p.35. Blackmagic Design. May, 2020.
- ^ FFmpeg 8.0 Released With OpenAI Whisper Filter, Many Vulkan Video Improvements. Phoronix. August 22, 2025
- ^ intoPIX Tico Raw is a format with a huge potential
- ^ intoPIX Tico Raw
- ^ 独自コーデックを搭載したニンテンドーDS版Sofdec (in Japanese) CRI Middleware, May 11, 2006
- ^ CRI・ミドルウェア、ゲーム開発者向けブログ「CRIチャンネル」を開設 (in Japanese) Impress Watch Corporation, April 19, 2007
List of codecs
View on Grokipediafrom Grokipedia
Audio Codecs
Uncompressed Audio Formats
Uncompressed audio formats represent digital audio signals in their raw form, without applying any data reduction techniques, ensuring that the original analog waveform is digitized and stored with complete fidelity. These formats primarily rely on pulse-code modulation (PCM), where continuous audio is sampled at regular intervals and quantized into discrete binary values. Key characteristics include variable sample rates, such as 44.1 kHz for consumer compact disc audio or 48 kHz for professional video and broadcasting applications, which determine the frequency range captured according to the Nyquist theorem. Bit depths typically range from 16 bits per sample for standard dynamic range to 24 bits or higher for extended resolution, allowing for greater precision in amplitude representation and lower noise floors. Channel configurations support mono for single-source recordings, stereo for two-channel playback, and multichannel setups like 5.1 surround for immersive audio.[8][9][10] A prominent example is linear PCM (LPCM), an uncompressed variant of PCM that uses uniform quantization steps and serves as the foundation for several container formats. In the WAV (Waveform Audio File Format), LPCM data is stored in little-endian byte order, preceded by a RIFF-based header that includes metadata like sample rate, bit depth, and channel count, making it ideal for Windows environments. The AIFF (Audio Interchange File Format) employs big-endian ordering for LPCM, with a similar header structure based on Apple's IFF specification, facilitating cross-platform compatibility in Macintosh systems. Broadcast WAV (BWF), an extension of WAV standardized by the European Broadcasting Union, adds fields for timecode and originator information in its header while retaining LPCM payload, enhancing interoperability in professional workflows. These formats maintain the integrity of the audio stream by avoiding any processing beyond basic encapsulation.[11][12][9] The historical roots of uncompressed audio trace back to the development of digital recording in the 1970s, but widespread adoption began with the Compact Disc Digital Audio (CD-DA) standard, known as the Red Book, published in 1980 by Philips and Sony. This specification defined stereo LPCM at 16-bit depth and 44.1 kHz sample rate, enabling approximately 74 minutes of playback per disc and revolutionizing consumer audio distribution. Over time, formats evolved to support higher resolutions; for instance, Direct Stream Digital (DSD), introduced in 1999 by Sony and Philips for Super Audio CD (SACD), uses 1-bit delta-sigma modulation at a 2.8224 MHz sample rate to achieve extended frequency response up to 100 kHz, positioning it as an uncompressed high-resolution alternative to multibit PCM. These advancements addressed limitations in dynamic range and bandwidth while preserving raw data integrity.[13][9][14] In professional applications, uncompressed formats are essential for studio mastering, where engineers require unaltered signals to apply precise equalization and dynamics processing without introducing artifacts. They are also critical in live sound reinforcement, supporting real-time multichannel mixing at high sample rates to minimize latency and ensure accurate reproduction during performances. For archival storage, formats like BWF are recommended by institutions for long-term preservation, as their uncompressed nature allows future generations to access the full original data without degradation from decoding errors. Unlike compressed alternatives, these formats retain every detail of the source material, making them indispensable where audio purity is paramount.[11][15][16] The storage requirements for uncompressed audio can be calculated using the formula for PCM data size, which accounts for the parameters defining the signal:
For example, a 3-minute stereo recording at 44.1 kHz and 16-bit depth yields approximately 10.5 MB, highlighting the trade-off between fidelity and file size in resource-intensive environments. This equation underscores the format's direct proportionality to audio specifications, guiding decisions in storage and transmission planning.[17][18]
Lossless Audio Codecs
Lossless audio codecs employ reversible compression techniques to reduce file sizes while enabling bit-identical reconstruction of the original waveform, distinguishing them from uncompressed formats that serve as the baseline for such reconstruction.[19] Core principles include predictive modeling, such as linear prediction, to estimate audio samples and generate residuals, followed by decorrelation and entropy coding methods like Huffman, Rice, or arithmetic coding to efficiently encode these residuals without data loss.[19] These approaches exploit redundancies in audio signals, achieving typical compression ratios of 40-60% file size reduction for CD-quality audio, though performance varies by content and codec.[20] Error detection mechanisms, such as CRC checksums, ensure integrity during storage and transmission, while support for multi-channel audio up to 8 or more channels accommodates surround sound formats.[19] The development of lossless audio codecs accelerated in the late 1990s and early 2000s as an alternative to the rising popularity of lossy formats like MP3, prioritizing archival quality and high-fidelity playback for audiophiles.[21] One early milestone was Shorten, introduced in 1993 by Tony Robinson at Cambridge University, which pioneered open-source lossless compression using linear prediction and Huffman coding, though it is now largely deprecated.[22] By the mid-2000s, FLAC had emerged as the de facto open-source standard, with its specification finalized in 2001 and adoption growing through integration in media players and hardware by 2007.[23] This period marked a shift toward royalty-free, efficient codecs suitable for both archiving and emerging streaming applications. Key examples illustrate the diversity in design and application. FLAC (Free Lossless Audio Codec), developed starting in 1999 by Josh Coalson and maintained by the Xiph.Org Foundation since 2003, uses fixed-order linear predictors (orders 0-4) and Rice coding for residuals, supporting metadata tagging, fast seeking, and streaming via Ogg containers.[23] It achieves compression ratios around 1.8:1 (approximately 55% of original size) and includes 32-bit CRC for error detection across multi-channel streams up to 8 channels.[19] ALAC (Apple Lossless Audio Codec), introduced by Apple in 2004 and standardized as MPEG-4 ALS in 2006, employs LPC with long-term prediction and Golomb-Rice or arithmetic coding, offering similar ratios (around 1.83:1) while supporting up to 24-bit/192 kHz resolution and integration in MP4 containers for iOS and macOS ecosystems.[24] Monkey's Audio (APE), released in 2000 by Matthew T. Ashland, combines adaptive predictors with range-style arithmetic encoding for superior compression (up to 1.87:1 ratios), though at higher computational cost, and features redundant CRCs for error resilience in multi-channel files.[25] OptimFROG, developed by Florin Ghido around 2001, prioritizes maximal compression through multi-layer adaptive filtering and range coding, often outperforming others in file size reduction for archiving, albeit with slower encoding.[26] These codecs have integrated into broader ecosystems, with FLAC gaining native support in extended WAV formats via RIFF chunks and powering lossless streaming on services like Tidal, which delivers HiRes FLAC up to 24-bit/192 kHz since 2021.[27] Such adoption underscores their role in preserving audio fidelity across professional production, consumer playback, and distribution platforms.[19]| Codec | Developer/Organization | Initial Release | Typical Compression Ratio | Key Features |
|---|---|---|---|---|
| FLAC | Xiph.Org Foundation | 2001 | 1.79:1 | Metadata, streaming, 8-channel support, CRC |
| ALAC | Apple Inc. | 2004 | 1.83:1 | MP4 integration, Hi-Res up to 192 kHz |
| Monkey's Audio | Matthew T. Ashland | 2000 | 1.87:1 | High compression, redundant CRCs |
| OptimFROG | Florin Ghido | ~2001 | >1.9:1 (optimized) | Maximal size reduction, adaptive filters |
| Shorten | Tony Robinson | 1993 | ~1.5-2:1 | Early linear prediction, open-source precursor |
General Lossy Audio Codecs
General lossy audio codecs are designed for compressing music and general audio signals by discarding data that is imperceptible to the human ear, achieving significant file size reductions while preserving subjective quality. These codecs rely on perceptual coding principles, which leverage psychoacoustic models to analyze the audio signal and allocate bits preferentially to perceptually salient components. By exploiting limitations in human auditory perception, such as the inability to discern fine details in the presence of stronger sounds, these methods enable efficient storage and transmission without audible degradation for most listeners.[28] At the core of perceptual coding are psychoacoustic models that compute masking thresholds to shape quantization noise below audible levels. Frequency masking occurs when a louder masker signal raises the detection threshold for nearby frequencies within critical bands, typically spanning 100-200 Hz at low frequencies and widening at higher ones; this allows quieter spectral components to be quantized more coarsely. Temporal masking complements this by suppressing audibility before (pre-masking, up to 20 ms) and after (post-masking, 100-200 ms) a strong signal. Transform coding, particularly the Modified Discrete Cosine Transform (MDCT), plays a key role by converting the time-domain signal into a frequency-domain representation with high resolution (e.g., 576 lines in MP3, up to 1024 in AAC), facilitating precise noise control and bit allocation while minimizing aliasing through windowing techniques.[29][30] A foundational example is MP3 (MPEG-1 Audio Layer III), developed by the Fraunhofer Society and standardized in ISO/IEC 11172-3 in 1993, which popularized digital music distribution. MP3 employs a hybrid filterbank combining polyphase and MDCT for spectral analysis, supporting bitrates from 32 to 320 kbps in constant bitrate (CBR) or variable bitrate (VBR) modes to adapt to signal complexity. Its dominance in the 2000s stemmed from widespread adoption in portable players like the iPod, enabling hours of storage on limited devices.[31] AAC (Advanced Audio Coding), standardized by MPEG in ISO/IEC 13818-7 in 1997, serves as MP3's successor, delivering equivalent quality at approximately 70% of the bitrate through enhanced tools like temporal noise shaping (TNS) and improved joint stereo coding. It supports multichannel audio up to 48 channels and includes HE-AAC profiles for low-bitrate efficiency (e.g., below 64 kbps per channel) via spectral band replication. By the 2010s, AAC supplanted MP3 in streaming platforms due to its superior efficiency for broadband delivery.[31] Ogg Vorbis, introduced by the Xiph.Org Foundation in 2000 with its bitstream frozen on May 8, emerged as a royalty-free, open-source alternative supporting sample rates from 8 to 48 kHz and bitrates from 45 to 500 kbps. Using MDCT-based analysis and advanced perceptual entropy coding, it matches or exceeds MP3 quality at comparable rates, promoting adoption in free software ecosystems.[32] Quality assessment for these codecs often uses the Mean Opinion Score (MOS), a subjective scale from 1 (bad) to 5 (excellent) derived from listener tests, where scores above 4.0 signify transparency—indistinguishability from the original. Bitrate ladders provide practical guidance: MP3 achieves near-transparent quality around 256-320 kbps (MOS ~4.2-4.5), while AAC reaches similar transparency at 192-256 kbps (MOS ~4.3-4.6), approximating CD-quality (16-bit/44.1 kHz PCM) fidelity for stereo music.[33][34] These codecs find primary applications in music streaming services, such as Apple Music (using 256 kbps AAC) and Spotify (employing AAC or Ogg Vorbis variants up to 320 kbps), where they balance bandwidth constraints with perceptual quality for on-demand playback. They also power portable devices, from smartphones to wireless earbuds, enabling efficient offline storage and Bluetooth transmission without compromising listening experiences.[35]Speech-Focused Audio Codecs
Speech-focused audio codecs are designed to efficiently compress human voice signals for transmission over bandwidth-limited channels, prioritizing intelligibility and naturalness at low bitrates while minimizing computational complexity for real-time applications such as telephony.[36] These codecs exploit the parametric structure of speech, modeling it as a source-filter system where the vocal tract shapes a source excitation (glottal pulses for voiced sounds or noise for unvoiced), enabling significant bandwidth reduction compared to general-purpose audio codecs.[37] A core technique in these codecs is linear predictive coding (LPC), which estimates the speech waveform as a linear combination of past samples to model the vocal tract's all-pole filter, thereby capturing formant frequencies that define vowel quality and speaker identity.[37] LPC parameters are typically derived over short frames (10-30 ms) using methods like autocorrelation or covariance analysis, allowing reconstruction with residual excitation. Code-excited linear prediction (CELP) extends LPC by using an analysis-by-synthesis approach to select excitation vectors from codebooks, optimizing perceptual quality at bitrates below 10 kbps.[38] Pitch detection plays a crucial role in voiced speech modeling, identifying the fundamental frequency (typically 50-400 Hz) via autocorrelation or normalized cross-correlation to enable periodic excitation that preserves formants and prosody.[39] The ITU-T G-series recommendations form the backbone of speech codec standards, evolving from 1980s public switched telephone network (PSTN) requirements to support voice over IP (VoIP) and mobile networks. Early developments focused on narrowband (300-3400 Hz) coding for analog-to-digital transition in telephony, with subsequent standards incorporating wideband extensions for improved clarity.[40] Key examples of speech-focused codecs include:| Codec | Year | Organization | Bitrate (kbps) | Technique |
|---|---|---|---|---|
| G.711 | 1988 (orig. 1972) | ITU-T | 64 | μ-law/A-law PCM |
| G.729 | 1996 | ITU-T | 8 | CS-ACELP |
| Opus | 2012 | IETF | 6-510 | Hybrid CELP/MDCT |
| Speex | 2003 | Xiph.Org | 2-44 | CELP-based |
Neural Audio Codecs
Neural audio codecs represent a class of data-driven compression techniques that leverage deep learning models to encode and decode audio signals, achieving higher efficiency and perceptual quality compared to traditional methods, particularly at low bitrates. These codecs typically employ an encoder-decoder architecture where the encoder compresses raw audio into a compact latent representation, often quantized into discrete tokens, and the decoder reconstructs the signal, frequently enhanced by generative components to minimize artifacts. Unlike rule-based approaches, neural codecs learn representations directly from large audio datasets, enabling adaptability to diverse content such as speech and music.[46][47] Core neural architectures in these codecs include autoencoders, which form the backbone for dimensionality reduction and reconstruction; generative adversarial networks (GANs), used to train high-fidelity decoders that produce realistic waveforms; and diffusion models, which iteratively denoise latent representations for improved generation quality. Autoencoders, often convolutional or transformer-based, map audio to a lower-dimensional space, while GANs, such as those employing discriminators to refine outputs, ensure perceptual fidelity during decoding. Diffusion models have emerged in recent designs to handle stochastic sampling in the latent space, enhancing codec robustness for variable-rate compression.[46][48][49] Prominent examples include Google's Lyra, introduced in 2021 as a hybrid neural codec targeting speech at approximately 3 kbps, utilizing a neural vocoder for efficient real-time encoding on resource-constrained devices. Building on this, Google's SoundStream, released the same year, pioneered an end-to-end neural architecture capable of compressing both speech and music at bitrates as low as 3 kbps, outperforming conventional codecs in subjective quality tests through its residual vector quantizer and GAN-trained decoder. Meta's EnCodec, launched in 2022, advanced high-fidelity compression to 3 kbps for stereo audio at 48 kHz, employing a multi-scale quantization scheme that supports real-time operation on consumer hardware.[50][46][47] Post-2020 advancements have integrated transformer architectures for scalable sequence modeling, allowing codecs to capture long-range dependencies in audio more effectively than earlier convolutional designs. These models are typically trained on expansive datasets like LibriSpeech, a 1,000-hour corpus of English speech, which facilitates learning of diverse acoustic patterns and improves generalization. Such integrations have enabled variable bitrate allocation, where the quantizer dynamically adjusts token usage based on content complexity, yielding up to 10x compression ratios over baselines like MP3 at equivalent quality levels.[51][52][47] Key benefits of neural audio codecs include adaptive bitrate control, which optimizes transmission in bandwidth-limited scenarios, and artifact reduction through post-processing neural vocoders that refine reconstructed signals for naturalness. These features make them suitable for applications requiring low latency, such as real-time communication, where traditional codecs often introduce audible distortions at sub-6 kbps rates.[53][46] As of 2025, neural audio codecs see experimental adoption in streaming protocols like WebRTC for voice AI interfaces, with open-source releases of models like EnCodec accelerating research and integration into hybrid systems. Challenges persist in computational efficiency for edge devices, but ongoing challenges like the Low-Resource Audio Codec initiative drive further innovations in low-bitrate performance.[54][55][56]Game Audio Codecs
Game audio codecs are designed to meet the unique demands of interactive entertainment, where real-time responsiveness is paramount. Unlike general-purpose audio compression, these codecs prioritize low-latency decoding to synchronize sounds with gameplay events, such as footstep impacts or weapon discharges, while maintaining dynamic range for immersive effects like explosions or ambient environments. They often support procedural audio generation, where sounds are synthesized on-the-fly based on game logic, and integrate with spatialization techniques like Head-Related Transfer Function (HRTF) for 3D positioning in virtual spaces. Middleware platforms such as FMOD and Wwise facilitate this by providing tools for codec selection, mixing, and real-time parameter adjustment, ensuring compatibility across hardware constraints.[57] The evolution of game audio codecs traces back to the resource-limited era of 1980s consoles, where simple chiptune synthesis dominated due to hardware constraints, evolving into sampled audio formats that balanced storage and performance. Early examples include Adaptive Differential Pulse Code Modulation (ADPCM), a predictive compression technique that reduced bitrate by encoding differences between samples, widely used in the Nintendo Entertainment System (NES) via 4-bit DPCM variants and the Super Nintendo Entertainment System (SNES) with its S-DSP chip supporting 16-bit ADPCM across eight channels at 32 kHz.[58][59] By the mid-1990s, platform-specific innovations emerged, such as Sony's VAG (Variable Audio Generator) codec for the PlayStation 1, an ADPCM-based format that compressed 16-bit audio into 4-bit blocks for efficient storage of sound effects and music on CDs. Microsoft's XMA (Xbox Media Audio), introduced for the Xbox 360, built on the Windows Media Audio (WMA) framework to offer both lossy and lossless modes with up to 32 channels and higher bitrates, optimizing for the console's multi-core architecture. In modern engines, OGG Vorbis has become a staple for its open-source, royalty-free compression, supporting variable bitrates from 45 kbps to 500 kbps and seamless integration in titles across PC and consoles.[60][61][62] This progression has extended into the 2020s with advanced codecs like Bink Audio from RAD Game Tools, integrated into Unreal Engine, offering high-quality compression and fast decoding for high-fidelity assets.[63] Lossless codecs like FLAC are occasionally referenced in development pipelines for uncompressed asset storage prior to final compression.[57] Technical characteristics of game audio codecs emphasize minimal CPU overhead to avoid impacting frame rates, typically targeting decode times under 1 ms per frame on mid-range hardware. They incorporate features like looping sample support for seamless music transitions and variable bitrate encoding to allocate more bits to high-impact sounds (e.g., action sequences) versus ambient noise, reducing overall memory footprint without perceptible artifacts. Latency is critically managed through buffer optimizations and hardware-accelerated decoding, often achieving end-to-end delays below 5 ms to align audio with visual cues in fast-paced scenarios.[57][64] These codecs are deployed across diverse platforms, including consoles like PlayStation and Xbox for standardized hardware pipelines, PC for customizable high-resolution playback, and mobile devices where power efficiency is key to extending battery life during extended sessions. Middleware like Wwise enables cross-platform codec handling, such as converting ADPCM for legacy compatibility or OGG for modern streaming, while FMOD focuses on low-overhead integration for indie titles.[57][65]Image Codecs
Uncompressed Image Formats
Uncompressed image formats store digital images as raw pixel data without any form of data reduction or encoding, preserving every bit of information exactly as captured or generated. These formats represent images through direct pixel-by-pixel arrays, typically in color spaces such as RGB or RGBA, where each pixel's value is stored sequentially. They support various bit depths, commonly 8-bit per channel for standard color reproduction or 16-bit for higher precision in professional workflows, ensuring no loss of detail during storage or basic transfer.[66][67][68] A defining characteristic of these formats is the absence of entropy reduction techniques, resulting in straightforward, predictable file structures that prioritize accessibility over efficiency. Pixel data is organized in row-major order, with optional headers specifying dimensions, color depth, and orientation. For instance, RGB storage allocates separate values for red, green, and blue components per pixel, while RGBA adds an alpha channel for transparency. Bit depths determine the range of color values: 8-bit allows 256 levels per channel (24-bit total for RGB), whereas 16-bit extends this to 65,536 levels, reducing quantization artifacts in gradients and shadows.[66][69][70] Key examples include the Bitmap (BMP) format, developed by Microsoft in the mid-1980s as a device-independent raster format for Windows systems, supporting uncompressed storage of 1- to 32-bit pixel data in a simple header-plus-pixel-array structure. The Truevision Targa (TGA) format, introduced by Truevision in 1984 for its graphics adapter hardware, features raw pixel arrays in uncompressed modes, accommodating 8- to 32-bit depths with flexible color mapping and optional alpha channels. The Portable Bitmap (PBM) and Portable Pixmap (PPM) formats, part of the Netpbm suite created by Jef Poskanzer in the late 1980s, provide minimalist uncompressed representations: PBM for binary monochrome (1-bit), and PPM for RGB color (up to 24-bit in binary form), emphasizing portability across Unix-like systems.[66][71][67][69][68][70] The storage size of an uncompressed image can be calculated using the formula: file size (in bytes) = width (pixels) × height (pixels) × number of channels × (bit depth per channel / 8). This equation reflects the direct mapping of pixel data, where channels typically number three for RGB or four for RGBA, and the division by 8 converts bits to bytes; for example, a 1920×1080 RGB image at 8-bit depth yields approximately 6.22 MB.[72][73] Historically, these formats emerged in the early days of digital imaging during the 1980s, coinciding with the rise of personal computers and graphics hardware, serving as foundational standards for scanners, frame buffers, and early displays before compression became prevalent. BMP standardized raster handling in Microsoft ecosystems, TGA supported high-end video capture, and Netpbm formats facilitated open-source image processing on Unix platforms, all predating widespread adoption of lossy methods.[71][69][68] In applications, uncompressed formats are essential in professional photography for raw camera sensor dumps, where every pixel's original data must be retained for post-processing. They are also critical in medical imaging for archiving diagnostic scans without alteration, scientific visualization for precise rendering of simulations and microscopy data, and graphics pipelines requiring unaltered input for rendering engines. Compared to their compressed variants, these formats yield larger files but guarantee fidelity.[74][75]Lossless Image Codecs
Lossless image codecs enable the compression of raster images while preserving every bit of original data, allowing perfect reconstruction without any quality degradation. This makes them essential for applications requiring pixel-perfect fidelity, such as technical illustrations, line art, and documents where alterations could introduce errors. Unlike uncompressed formats, which store raw pixel data without reduction, lossless codecs exploit redundancies in image structure through reversible algorithms to achieve file size reductions while maintaining exact reproducibility.[76] Common techniques in lossless image compression include run-length encoding (RLE), which efficiently represents sequences of identical pixels by storing the value once along with the count of repetitions, particularly effective for images with large uniform areas. Lempel-Ziv-Welch (LZW) compression builds a dictionary of recurring patterns to substitute repeated sequences with shorter codes, offering good performance on images with repetitive elements. DEFLATE, combining LZ77 sliding window matching with Huffman coding, further enhances efficiency by predicting pixel values across scanlines and applying adaptive filtering to minimize differences before compression.[77][78] The Graphics Interchange Format (GIF), introduced by CompuServe in 1987, was one of the earliest widely adopted lossless codecs, supporting up to 256 colors per frame and using LZW compression alongside RLE for simple raster images and animations. It became a staple for early web graphics due to its compact size over slow connections but is limited by its color palette, making it unsuitable for photographs. The Portable Network Graphics (PNG) format, developed in 1994 by the PNG Development Group and standardized by the W3C in 1996, emerged as a patent-free alternative to GIF, incorporating DEFLATE compression with per-row filtering and support for full-color images, transparency via alpha channels, and interlacing for progressive loading.[79][76][80] The Tagged Image File Format (TIFF), originally created by Aldus Corporation in the mid-1980s and later maintained by Adobe, supports a wide range of lossless compression options including LZW and DEFLATE, along with multi-page documents and extensive metadata embedding such as EXIF tags for camera settings and geolocation. Evolving from needs in the desktop publishing and printing industries, TIFF provides flexibility for high-bit-depth images and layered data, though its verbosity can result in larger files compared to PNG. Typical compression ratios for PNG range from 2:1 to 5:1 depending on image content, balancing size and computational effort.[81][78][82][77] These codecs find primary use in web graphics (PNG and GIF for transparency and icons), CAD drawings and vector-to-raster conversions (TIFF for precision), and archival photography where data integrity is paramount, ensuring no loss during repeated encoding-decoding cycles.[83]Lossy Image Codecs
Lossy image codecs apply perceptual compression to photographic and web images, discarding data that minimally impacts human perception to achieve substantial file size reductions, often 10-20 times smaller than uncompressed formats while maintaining acceptable visual quality. These codecs transform and quantize image data in ways that exploit visual redundancies, making them ideal for bandwidth-constrained environments. Unlike lossless methods, which preserve all original data for exact reconstruction, lossy approaches introduce irreversible changes but enable practical storage and transmission of high-resolution images.[84] Central principles include the Discrete Cosine Transform (DCT), which divides images into 8x8 pixel blocks and converts spatial data to frequency components, concentrating energy in lower frequencies for efficient compression; quantization, which scales and rounds DCT coefficients using a quality-specific table to eliminate fine details; and chroma subsampling, typically in 4:2:0 format, which halves or quarters color (chrominance) resolution relative to luminance since the human eye is less sensitive to color variations.[84] These steps, applied sequentially in codecs like JPEG, ensure that perceptual fidelity is prioritized over pixel-perfect accuracy.[84] Prominent examples include JPEG, standardized in 1992 by the ITU-T under ISO/IEC 10918 with the JFIF file format for interchange, featuring adjustable quality levels from 1 (highly compressed) to 100 (near-lossless) to control the trade-off between size and artifacts.[84][85] WebP, released by Google in 2010 based on the VP8 video codec, supports lossy compression alongside animation, transparency, and optional lossless modes, yielding 25-34% smaller files than equivalent JPEGs or PNGs.[86] JPEG 2000, finalized by the JPEG committee in 2000 as ISO/IEC 15444, shifts to discrete wavelet transforms for better rate-distortion performance, enabling scalable quality and lossless fallback while outperforming DCT-based methods at low bit rates.[87] AVIF (AV1 Image File Format), developed by the Alliance for Open Media and standardized in 2020 as ISO/IEC 23000-22, leverages intra-frame encoding from the AV1 video codec for lossy compression, achieving 20-50% size reductions over JPEG and WebP with support for HDR, wide color gamut, and transparency; by November 2025, it has gained broad adoption across major web browsers and platforms for efficient image delivery.[88] Key milestones encompass JPEG's rapid ubiquity in digital cameras during the 1990s, where it standardized image encoding for consumer devices, fueling the shift from film to digital photography with efficient storage for millions of pixels.[89] WebP gained broad browser adoption in the 2020s, achieving native support across Chrome (2013 onward), Firefox (2019), Edge (2019), and Safari (2020), which accelerated its integration into web ecosystems for optimized image delivery.[90] Common artifacts at aggressive compression include blocking, visible grid-like boundaries from independent block processing, and ringing, wave-like distortions near edges due to Gibbs phenomenon in quantized frequencies, both more pronounced in uniform or high-contrast areas.[91] The Peak Signal-to-Noise Ratio (PSNR) serves as a standard metric to evaluate these, computing the logarithmic ratio of peak signal power to mean squared error distortion, where values above 30 dB typically indicate good perceptual quality.[92] These codecs power applications in web display, where formats like WebP reduce load times on sites serving billions of images daily; mobile photography, enabling quick captures and sharing on devices with limited storage; and social media, facilitating efficient uploads of user-generated photos across platforms like Instagram and Twitter.[86][93]RAW and Specialized Image Codecs
RAW image formats store unprocessed or minimally processed data directly from a digital camera's image sensor, enabling extensive post-production adjustments while preserving the sensor's full dynamic range and color fidelity. Unlike processed formats, RAW files maintain data in a linear color space with gamma 1.0, avoiding perceptual adjustments that could limit editing flexibility. They often incorporate proprietary headers that encode sensor-specific details, such as filter array patterns and decoding instructions, which vary by manufacturer.[94] A common structure in RAW files is the Bayer pattern color filter array, where individual photosites capture only one color—red, green, or blue—with green filters typically doubled to match human visual sensitivity. This results in monochrome grayscale values per pixel, necessitating demosaicing algorithms during conversion to full-color RGB images; these algorithms interpolate missing color data using surrounding pixels and metadata about the filter arrangement.[94] Prominent examples include Adobe's Digital Negative (DNG), introduced in 2004 as an open, archival standard based on TIFF/EP to promote interoperability across camera brands. Canon's CR2 format, launched in 2004 with the EOS 20D camera, uses lossless JPEG compression for sensor data within a TIFF structure.[95] Nikon's NEF (Nikon Electronic Format) captures uncompressed or compressed sensor output, supporting 12- or 14-bit depths depending on the model. For specialized needs, OpenEXR (EXR), developed by Industrial Light & Magic in 1999, supports high dynamic range (HDR) imaging with 16-bit half-precision floating-point values per channel, allowing representation of over 30 stops of dynamic range without integer limitations.[96][97][98] These formats typically feature 12- to 16-bit depth per channel, providing 4,096 to 65,536 tonal levels for smoother gradients and reduced banding compared to 8-bit formats. Embedded metadata includes camera settings like exposure time, ISO sensitivity, and white balance multipliers, which can be non-destructively adjusted in software without altering the original sensor data.[94][97][99] Over the 2010s and 2020s, adoption of standardized formats like DNG accelerated due to concerns over vendor lock-in, where proprietary RAW files risked becoming obsolete as manufacturers discontinued support, complicating long-term archiving and software compatibility. Adobe's open DNG specification addressed this by embedding original proprietary data while providing a universal wrapper, though many vendors persist with native formats for optimized performance.[100] In professional workflows, RAW formats are essential for photography, allowing precise control over highlights, shadows, and color corrections without quality loss. In visual effects (VFX), formats like EXR facilitate multi-channel HDR compositing and layering for film production. Specialized applications extend to medical imaging, where RAW-like minimally processed data supports accurate diagnostic scans by retaining full sensor precision.[101][102][103]Text Compression Formats
Lossless Text Codecs
Lossless text codecs are algorithms designed to compress textual data in a reversible manner, ensuring exact reconstruction of the original input without any information loss. These methods exploit redundancies in text, such as repeated patterns and symbol frequencies, to reduce storage and transmission sizes while maintaining data integrity. Widely adopted in file archiving, data transfer protocols, and backup systems, they form the backbone of tools like ZIP and gzip, enabling efficient handling of documents, logs, and source code.[104] The primary techniques in lossless text compression include dictionary-based methods, Huffman coding, and the Burrows-Wheeler transform (BWT). Dictionary-based approaches, such as LZ77 and LZ78 variants, build a dynamic or static dictionary of substrings from the input text to replace repetitions with shorter references; LZ77 uses a sliding window over previous data for matches, while LZ78 constructs a dictionary incrementally from the input stream. Huffman coding, a prefix-free entropy encoding scheme, assigns variable-length binary codes to symbols based on their frequency probabilities, with shorter codes for more common characters to minimize average code length. BWT rearranges the input into a rotated form that groups similar characters together, facilitating subsequent run-length encoding and entropy coding for improved compression ratios. Key examples trace the evolution from early archivers to modern implementations. The ARC format, introduced in 1985 by System Enhancement Associates, pioneered LZW-based compression for archiving multiple files.[105] ZIP, developed by PKWARE in 1989, later incorporated the DEFLATE algorithm—a combination of LZ77 and Huffman coding—in 1993 for broad compatibility in software distribution. Gzip, released in 1992 by Jean-loup Gailly and Mark Adler, primarily uses DEFLATE (with optional LZW support) and became a standard for web compression via HTTP. Bzip2, authored by Julian Seward in 1996, integrates BWT with Huffman coding to achieve higher ratios for larger files. LZMA, integrated into 7-Zip by Igor Pavlov around 1999 and formalized in its SDK by 2007, extends LZ77 with advanced range encoding for superior compression, particularly on repetitive text. More recently, Zstandard (zstd), developed by Yann Collet at Facebook and released in 2016, balances high compression ratios with fast decompression speeds using a finite-state entropy coder alongside LZ77-style matching. Compression ratios for lossless text codecs typically range from 2x to 10x on repetitive content like logs or code, depending on the algorithm and input entropy; for instance, DEFLATE often yields 3-5x on English text. These limits are bounded by Shannon's entropy, which quantifies the minimum average bits per symbol aswhere is the probability of symbol , representing the inherent information content that no lossless method can compress below. Applications span software distribution for smaller downloads, log file archiving to save storage, and backups to optimize data retention, ensuring fidelity for analytical and archival purposes.[106]
| Codec | Year | Developer | Core Methods | Notable Feature |
|---|---|---|---|---|
| ARC | 1985 | SEA | LZW (LZ78 variant) | Early multi-file archiver |
| ZIP (DEFLATE) | 1993 | PKWARE | LZ77 + Huffman | Ubiquitous file format |
| gzip | 1992 | Gailly/Adler | DEFLATE/LZW | Web and Unix standard |
| bzip2 | 1996 | Seward | BWT + Huffman | High ratios for large files |
| LZMA | ~1999/2007 | Pavlov (7-Zip) | LZ77 + range coding | Excellent for binaries/text |
| zstd | 2016 | Collet (Facebook) | LZ77 + FSE | Fast decompression |
Lossy Text Codecs
Lossy text codecs represent a niche category of compression techniques designed for structured textual data, such as HTML, XML, or markup languages, where the irreversible discard of non-essential elements like whitespace, comments, or redundant attributes is permissible without compromising core functionality or semantic integrity. Unlike general-purpose lossless text compression, which preserves every byte for exact reconstruction, lossy approaches prioritize bandwidth and storage efficiency in scenarios where human readability of the source is secondary to performance, such as web delivery or resource-constrained environments. These methods emerged prominently in the 2010s amid web optimization efforts, driven by the need to accelerate page loads on mobile and low-bandwidth networks, often building on lossless foundations by applying targeted approximations post-compression. Dedicated lossy text codecs remain rare as of 2025, with most implementations focusing on preprocessing steps like minification followed by lossless compression; emerging AI-based methods, such as denoising autoencoders in tools like TextEconomizer, show promise for semantic-preserving compression.[107][108][109] Core approaches in lossy text compression include semantic pruning, which systematically eliminates structurally insignificant components to reduce file size while retaining meaning; for instance, pruning techniques parse the text into a syntactic tree and remove nodes like optional formatting elements, achieving higher ratios than pure entropy coding. Tokenization contributes by breaking text into compact symbols or groups, sometimes sacrificing precise spacing or order details for brevity, as seen in web markup where tokens represent repeated patterns approximately. Approximate matching further enables efficiency by identifying and substituting similar substrings or tags with shorter equivalents, particularly effective for repetitive XML structures where minor variations do not alter data interpretation. These strategies allow for controlled fidelity, with tools enabling users to adjust the aggressiveness of data discard to balance size against potential artifacts.[110][111] Prominent examples include specialized HTML minifiers like htmlcompressor, a tool that applies semantic pruning to remove comments, unnecessary whitespace, and shorten attribute values in HTML, CSS, and JavaScript, often yielding 20-30% size reductions standalone before further lossless layering. These minification tools, while not traditional codecs, function as lossy preprocessors in compression pipelines for web content. Such approaches typically offer 10-20% extra reduction over pure lossless methods by tolerating approximation, with the trade-off being potential loss of source editability or debugging ease, mitigated through configurable parameters.[112] Applications of lossy text codecs are concentrated in web serving, where they minimize transfer sizes for faster rendering; mobile applications, to conserve data usage and battery; and embedded systems, for fitting constraints in IoT or firmware updates. In practice, fidelity controls ensure that changes remain imperceptible to end-users, such as unaltered DOM output in browsers, making these techniques viable for production environments despite their rarity compared to lossless alternatives.[108][107]Video Codecs
Uncompressed Video Formats
Uncompressed video formats store sequences of individual frames as raw pixel data in color spaces such as RGB or YUV, without applying any intra-frame or inter-frame compression to eliminate redundancy. This approach ensures complete fidelity to the source material, with each frame comprising full-resolution luma (Y) and subsampled chroma (U/V) components in YUV formats, or direct red, green, and blue values in RGB. Common YUV subsampling like 4:2:2 maintains uncompressed status by avoiding data reduction techniques, allowing pixel-perfect reproduction ideal for intermediate processing stages. These formats are stored in simple raw files or embedded within containers like AVI or QuickTime MOV without codec-based alteration.[113][114] Prominent examples include raw YUV files (often with .yuv extensions), which consist of planar or packed frame data where Y, U, and V planes are sequentially arranged without headers or metadata beyond basic dimensions. Uncompressed AVI or MOV files similarly encapsulate these raw streams, supporting bit depths up to 16 bits per channel for professional use. Apple's ProRes 4444 serves as a near-uncompressed variant, delivering 12-bit 4:4:4:4 RGBA data with visually lossless compression limited to alpha channels, enabling efficient handling of high-dynamic-range content while approximating raw quality.[113][115][116] The storage requirements for these formats are substantial due to their unaltered nature. For a YUV 4:2:2 frame, the size in bytes is calculated as $ \text{width} \times \text{height} \times \frac{\text{bit depth}}{8} \times 2 $, reflecting full luma sampling and half-horizontal chroma sampling per component. The total video size is then this frame size multiplied by frames per second and duration in seconds; for instance, a 1920×1080 10-bit 4:2:2 frame at 60 fps yields approximately 2.49 Gbps, or 5.18 MB per frame. Uncompressed formats originated in late-1980s broadcast standards, exemplified by the Serial Digital Interface (SDI) developed for SMPTE in 1989 to transmit unaltered digital video over coaxial cables, replacing analog workflows. By the 2010s, they extended to 4K and 8K production pipelines, supporting resolutions up to 8192×4320 at 60 fps for demanding real-time playback.[117][118][119][120] In applications, uncompressed video excels in film post-production for precise editing and compositing, where multigenerational workflows demand no quality loss. CGI rendering pipelines leverage these formats to transfer high-fidelity assets between software, ensuring seamless integration of rendered elements. Scientific fields, including medical videoconferencing and high-speed imaging, utilize them for analysis requiring exact pixel values, as seen in uncompressed HD systems outperforming compressed alternatives in detail preservation. While suited for production, they contrast with compressed formats reserved for final distribution due to bandwidth constraints.[115][121][122]Analog Video Formats
Analog video formats represent the foundational standards for television broadcasting and recording prior to the widespread adoption of digital technologies, primarily utilizing continuous electrical waveforms to convey luminance and chrominance information. Composite video, the most basic type, combines luminance (brightness) and chrominance (color) into a single signal, as seen in the NTSC standard developed in the United States in 1953, which allocates a bandwidth of 4.2 MHz for the video signal, employs 525 interlaced scan lines at 30 frames per second, and maintains a 4:3 aspect ratio.[123] Similarly, the PAL standard, introduced in Europe during the 1960s, uses 625 interlaced lines at 25 frames per second with the same 4:3 aspect ratio, modulating chrominance on a subcarrier to achieve color encoding while preserving compatibility with monochrome receivers.[123] These composite formats, while efficient for broadcast transmission over limited bandwidth channels of 6 MHz, suffered from cross-color and cross-luminance artifacts due to the intertwined signals.[124] Component video formats addressed these limitations by separating the signals into distinct channels, such as Y (luminance), Pb (blue-difference), and Pr (red-difference), originating in the 1950s as an intermediate processing step in early color television systems to maintain higher fidelity in professional environments.[125] S-Video, a simplified component variant, separates luminance (Y) from chrominance (C) into two channels, offering improved color resolution over composite without the full separation of YPbPr, and became prominent in consumer devices from the late 1980s.[123] Interlacing, a common characteristic across these formats, alternates odd and even scan lines between fields to reduce bandwidth demands while achieving effective motion portrayal, though it introduced potential artifacts like flicker in static images.[123] Aspect ratios were standardized at 4:3 to match the era's display conventions, limiting horizontal resolution to approximately 330-440 TV lines depending on the signal's modulation.[124] The transition from analog to digital involved digitizing these waveforms using standards like ITU-R BT.601, established in 1982, which defines sampling parameters for component video—13.5 MHz for luminance and 6.75 MHz for each chrominance difference signal—to enable accurate conversion of NTSC, PAL, and similar sources into digital form with minimal aliasing. This facilitated the development of codecs for analog capture, such as DV, introduced in 1995 by a consortium including Sony and Panasonic, which employs intra-frame compression to digitize and store video from analog sources like camcorders or tape decks, ensuring frame-independent editing suitable for legacy material transfer.[126] Professional examples include Sony's Betacam format, launched in 1982, a component analog system recording Y, Pb, and Pr signals on 1/2-inch tape with FM modulation for high-bandwidth studio use, supporting up to 90 minutes of recording.[127] For consumer analog like VHS, which used composite signals, digitization often relied on intra-frame codecs such as MJPEG, applying JPEG compression per frame to capture and archive footage without inter-frame dependencies that could exacerbate generational loss.[128] In the legacy context, widespread digitization projects in the 2000s preserved deteriorating analog tapes through systematic conversion, as exemplified by institutional efforts to migrate broadcast and home video collections to digital files under ITU-R BT.601 guidelines, preventing signal degradation from magnetic decay and ensuring long-term accessibility.[129] These initiatives bridged the pre-digital era to modern uncompressed digital formats, which serve as direct successors by maintaining raw signal fidelity post-conversion.[130]Lossless Video Codecs
Lossless video codecs enable frame-accurate compression of video data without any degradation in quality, reconstructing the original pixels bit-for-bit upon decoding. This reversibility relies on mathematical techniques that eliminate redundancies while preserving all information, making them indispensable for workflows requiring unaltered fidelity, such as professional editing and long-term storage. Unlike lossy alternatives, these codecs prioritize exact reconstruction over aggressive size reduction, typically achieving modest efficiency gains suitable for intermediate production stages.[131] The core methods in lossless video codecs involve intra-frame compression, treating each frame as an independent image and applying techniques akin to PNG's predictive differential encoding to exploit spatial redundancies within frames. More sophisticated approaches incorporate inter-frame prediction to capture temporal similarities between consecutive frames, followed by entropy coding like arithmetic or Huffman methods to represent prediction errors compactly. For instance, median prediction—computing a value as the median of neighboring pixels (e.g., left, top, and top-left)—is commonly used, paired with context-adaptive arithmetic range coding for optimal bit allocation. These reversible processes ensure no information discard, supporting diverse pixel formats and colorspaces.[131][132] Prominent examples include FFV1, an open-source intra-frame codec developed in 2003 by Mike Melanson as part of the FFmpeg project and maintained by Michael Niedermayer since 2004. Standardized in RFC 9043 (2021), FFV1 employs median prediction and either Golomb Rice or range (arithmetic) coding, supporting bit depths from 8 to 16 bits per sample and alpha channels via a dedicated transparency plane. It handles various chroma subsamplings (e.g., 4:4:4 for high-fidelity content) and is noted for its efficiency in preservation tasks. Huffyuv, created around 2000 by Ben Rudiak-Gould, focuses on speed for capture scenarios, using Huffman coding on intra-frame prediction errors (e.g., gradient or left predictors) for YCbCr or RGB data at 8-bit depths, though it lacks native alpha support. Lossless JPEG (LJPEG), introduced in 1993 as an extension to the ISO/IEC 10918-1 JPEG standard, applies predictive coding with Huffman or arithmetic entropy per frame, commonly in motion JPEG streams, supporting up to 16-bit depths but without standard alpha integration for video. These codecs exemplify the balance between speed, efficiency, and compatibility in lossless design.[133][131][134][135] Historically, lossless video codecs arose to address escalating storage demands in the 1990s and early 2000s, particularly in visual effects (VFX) pipelines where repeated decoding and re-encoding could accumulate errors in lossy formats. The VFX industry, including studios like Industrial Light & Magic, drove adoption through needs for high-bit-depth intermediates in compositing and rendering, often using frame-sequence approaches but extending to video codecs for streamlined workflows. FFV1's evolution—from version 0 (2006) to version 3 (2013) with added CRC checksums for integrity—reflects growing emphasis on archival reliability, while Huffyuv targeted real-time capture to supplant uncompressed formats. LJPEG's integration into early digital video standards facilitated its use in professional tools from the outset.[133][131] In practice, these codecs serve archival purposes, as endorsed by the Library of Congress for long-term audiovisual preservation in containers like Matroska, ensuring data integrity for future access. They function as editing proxies in post-production to avoid generational loss during cuts and effects application, and in broadcast contribution feeds for transmitting raw feeds prior to final encoding. Support for alpha channels (e.g., in FFV1) aids transparency handling in VFX compositing, while high bit depths (up to 16 bits) preserve subtle gradients in scientific imaging or HDR workflows. Compression ratios generally range from 2-3x relative to uncompressed sources, with FFV1 averaging 2.7-3.1x across RGB, YUY2, and YV12 colorspaces, Huffyuv around 2-2.2x, and LJPEG about 1.5-2x, depending on content complexity—providing meaningful storage savings without compromising quality.[133][136][131]| Codec | Typical Compression Ratio | Key Features Supported | Primary Strengths |
|---|---|---|---|
| FFV1 | 2.7-3.1x | Alpha channels, 8-16 bit depths, intra-frame | Archival efficiency, standardization |
| Huffyuv | 2-2.2x | 8-bit depths, RGB/YCbCr | Encoding speed, capture use |
| LJPEG | 1.5-2x | Up to 16-bit depths, intra-frame | Compatibility with JPEG tools |