Recent from talks
Contribute something
Nothing was collected or created yet.
Erasure code
View on WikipediaThis article needs additional citations for verification. (August 2023) |
In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = k/n is called the code rate. The fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency. The recovery algorithm expects that it is known which of the n symbols are lost.
History
[edit]Erasure coding was invented by Irving Reed and Gustave Solomon in 1960.[1]
There are many different erasure coding schemes. The most popular erasure codes are Reed-Solomon coding, Low-density parity-check code (LDPC codes), and Turbo codes.[1]
As of 2023, modern data storage systems can be designed to tolerate the complete failure of a few disks without data loss, using one of 3 approaches:[2][3][4]
- Replication
- RAID
- Erasure Coding
While technically RAID can be seen as a kind of erasure code,[5] "RAID" is generally applied to an array attached to a single host computer (which is a single point of failure), while "erasure coding" generally implies multiple hosts,[3] sometimes called a Redundant Array of Inexpensive Servers (RAIS). The erasure code allows operations to continue when any one of those hosts stops.[4][6]
Compared to block-level RAID systems, object storage erasure coding has some significant differences that make it more resilient.[7][8][9][10][11]
Optimal erasure codes
[edit]This section may be too technical for most readers to understand. (August 2023) |
Optimal erasure codes have the property that any k out of the n code word symbols are sufficient to recover the original message (i.e., they have optimal reception efficiency). Optimal erasure codes are maximum distance separable codes (MDS codes).
Parity check
[edit]Parity check is the special case where n = k + 1. From a set of k values , a checksum is computed and appended to the k source values:
The set of k + 1 values is now consistent with regard to the checksum. If one of these values, , is erased, it can be easily recovered by summing the remaining variables:
RAID 5 is a widely used application of the parity check erasure code using XOR.[1]
Polynomial oversampling
[edit]Example: Err-mail (k = 2)
[edit]In the simple case where k = 2, redundancy symbols may be created by sampling different points along the line between the two original symbols. This is pictured with a simple example, called err-mail:
Alice wants to send her telephone number (555629) to Bob using err-mail. Err-mail works just like e-mail, except
- About half of all the mail gets lost.
- Messages longer than 5 characters are illegal.
- It is very expensive (similar to air-mail).
Instead of asking Bob to acknowledge the messages she sends, Alice devises the following scheme.
- She breaks her telephone number up into two parts a = 555, b = 629, and sends 2 messages – "A=555" and "B=629" – to Bob.
- She constructs a linear function, , in this case , such that and .
- She computes the values f(3), f(4), and f(5), and then transmits three redundant messages: "C=703", "D=777" and "E=851".
Bob knows that the form of f(k) is , where a and b are the two parts of the telephone number. Now suppose Bob receives "D=777" and "E=851".
Bob can reconstruct Alice's phone number by computing the values of a and b from the values (f(4) and f(5)) he has received. Bob can perform this procedure using any two err-mails, so the erasure code in this example has a rate of 40%.
Note that Alice cannot encode her telephone number in just one err-mail, because it contains six characters, and that the maximum length of one err-mail message is five characters. If she sent her phone number in pieces, asking Bob to acknowledge receipt of each piece, at least four messages would have to be sent anyway (two from Alice, and two acknowledgments from Bob). So the erasure code in this example, which requires five messages, is quite economical.
This example is a little bit contrived. For truly generic erasure codes that work over any data set, we would need something other than the f(i) given.
General case
[edit]The linear construction above can be generalized to polynomial interpolation. Additionally, points are now computed over a finite field.
First we choose a finite field F with order of at least n, but usually a power of 2. The sender numbers the data symbols from 0 to k − 1 and sends them. He then constructs a (Lagrange) polynomial p(x) of order k such that p(i) is equal to data symbol i. He then sends p(k), ..., p(n − 1). The receiver can now also use polynomial interpolation to recover the lost packets, provided he receives k symbols successfully. If the order of F is less than 2b, where b is the number of bits in a symbol, then multiple polynomials can be used.
The sender can construct symbols k to n − 1 'on the fly', i.e., distribute the workload evenly between transmission of the symbols. If the receiver wants to do his calculations 'on the fly', he can construct a new polynomial q, such that q(i) = p(i) if symbol i < k was received successfully and q(i) = 0 when symbol i < k was not received. Now let r(i) = p(i) − q(i). Firstly we know that r(i) = 0 if symbol i < k has been received successfully. Secondly, if symbol i ≥ k has been received successfully, then r(i) = p(i) − q(i) can be calculated. So we have enough data points to construct r and evaluate it to find the lost packets. So both the sender and the receiver require O(n (n − k)) operations and only O(n − k) space for operating 'on the fly'.
Real world implementation
[edit]This process is implemented by Reed–Solomon codes, with code words constructed over a finite field using a Vandermonde matrix.
Most practical erasure codes are systematic codes—each one of the original k symbols can be found copied, unencoded, as one of the n message symbols.[12] (Erasure codes that support secret sharing never use a systematic code).
Near-optimal erasure codes
[edit]Near-optimal erasure codes require (1 + ε)k symbols to recover the message (where ε>0). Reducing ε can be done at the cost of CPU time. Near-optimal erasure codes trade correction capabilities for computational complexity: practical algorithms can encode and decode with linear time complexity.
Fountain codes (also known as rateless erasure codes) are notable examples of near-optimal erasure codes. They can transform a k symbol message into a practically infinite encoded form, i.e., they can generate an arbitrary amount of redundancy symbols that can all be used for error correction. Receivers can start decoding after they have received slightly more than k encoded symbols.
Regenerating codes address the issue of rebuilding (also called repairing) lost encoded fragments from existing encoded fragments. This issue occurs in distributed storage systems where communication to maintain encoded redundancy is a problem.[12]
Applications of erasure coding in storage systems
[edit]Erasure coding is now standard practice for reliable data storage.[13][14][15] In particular, various implementations of Reed-Solomon erasure coding are used by Apache Hadoop, the RAID-6 built into Linux, Microsoft Azure, Facebook cold storage, and Backblaze Vaults.[15][12]
The classical way to recover from failures in storage systems was to use replication. However, replication incurs significant overhead in terms of wasted bytes. Therefore, increasingly large storage systems, such as those used in data centers use erasure-coded storage. The most common form of erasure coding used in storage systems is Reed-Solomon (RS) code, an advanced mathematics formula used to enable regeneration of missing data from pieces of known data, called parity blocks. In a (k, r) RS code, a given set of '' data blocks, called "chunks", are encoded into (k + r) chunks. The total set of chunks comprises a stripe. The coding is done such that as long as at least k out of (k + r) chunks are available, one can recover the entire data. This means a (k, mr') RS-encoded storage can tolerate up to m failures. (This is different from the standard RS(n, k) notation, with n = k + r.)
RS(10,4)
[edit]Example: In RS(10, 4) code, which is used in Facebook for their HDFS (now part of Apache Hadoop), 10 MB of user data is divided into ten 1MB blocks. Then, four additional 1 MB parity blocks are created to provide redundancy. This can tolerate up to 4 concurrent failures. The storage overhead here is 14/10 = 1.4×.[16]
In the case of a fully replicated system, the 10 MB of user data will have to be replicated 4 times to tolerate up to 4 concurrent failures. The storage overhead in that case will be 50/10 = 5 times.
This gives an idea of the lower storage overhead of erasure-coded storage compared to full replication and thus the attraction in today's storage systems.
The Hitchhiker scheme can be combined with RS-coding to reduce the amount of computation and data-transfer required for data block reconstruction. It is also implemented as an HDFS codec, though a policy will need to be manually defined for it to be used.[12]
Hot data
[edit]Initially, erasure codes were used to reduce the cost of storing "cold" (rarely-accessed) data efficiently; but erasure codes can also be used to improve performance serving "hot" (more-frequently-accessed) data compared to simpler redundancy schemes (mirroring).[12]
The classic example of erasing coding improving performance is RAID 5, which provides the same single-drive failure protection while requiring fewer hard drives compared to RAID 1. The extra hard drives may then be used to store more data and make use of the improved read/write-speed multiplier on RAID 5. This also applies to RAID 6 (double-redundancy: one parity and one erasure code), assuming processing power is sufficient.[1] Generalized RAID can work with any number of redundancy drives. There are two notations for generalized RAID: RAID7.x refers to a system with x redundancy drives, allowing recovery when as many as x drives fail.[17] Alternatively, RAID N+M refers to N regular data drives with M redundancy drives, being able to recover all the data when any M drives fail.[1]
A more advanced example is EC-Cache, a cluster cache, i.e. a cache distributed among several nodes. Such systems can suffer from load imbalance when one node happens to host more popular items and a common method to address this issue is selective replication, i.e. create more replicas for more popular objects. However, this method is limited by the available amount of memory. By individually dividing every object into k splits and r redundancy units, perfect load balancing can instead be achieved with minimal waste of memory.[12]
Examples
[edit]Here are some examples of implementations of the various codes:
Near optimal erasure codes
[edit]Near optimal fountain (rateless erasure) codes
[edit]Optimal erasure codes
[edit]- XOR Parity, adding one erasable symbol. Used in RAID4, RAID5.
- Reed–Solomon codes. It is optimal when used in erasure mode, allowing k erasures for k added symbols. When used in error-correction mode, it is also optimal, allowing floor(k/2) errors.
- Parchive file format 1.0, 2.0, and (open) 3.0 uses RS. 3.0 also supports other linear codes.
- RAID6 uses a variety of optimal erasure codes, but RS is a common choice.
- Tahoe-LAFS includes zfec, an implementation of Classic RS.
- Erasure Resilient Systematic Code, an MDS code allowing for more redundant packets than Reed-Solomon at the same symbol size, see RS(4,2) with 2 bits or RS(9,2) with 3 bits
- Regenerating Codes[18][19]
- any other maximum distance separable code
See also
[edit]- Forward error correction codes.
- Secret sharing (differs in that the original secret is encrypted and obscured until the decode quorum is reached)
- Spelling alphabet
- Binary erasure channel
References
[edit]- ^ a b c d e "RAID vs. Erasure Coding. What's the Difference? | Blog | Xinnor". The Fastest and Most Reliable Software RAID | Xinnor. 2023-09-03. Retrieved 2024-09-18.
- ^ "Ceph.io — Erasure Coding in Ceph". ceph.io. 2014-04-07. Retrieved 2024-09-18.
- ^ a b Lee, Brandon (2023-12-26). "RAID vs Erasure Coding vs Replication". BDRSuite. Retrieved 2024-09-18.
- ^ a b O'Reilly, Jim. "RAID Vs. Erasure Coding". www.networkcomputing.com. Retrieved 2024-09-18.
- ^ Dimitri Pertin, Alexandre van Kempen, Benoît Parrein, Nicolas Normand. "Comparison of RAID-6 Erasure Codes". The third Sino-French Workshop on Information and Communication Technologies, SIFWICT 2015, Jun 2015, Nantes, France. ffhal-01162047f
- ^ "Understanding IBM Spectrum Scale Erasure Code Edition fault tolerance". www.ibm.com. Retrieved 2024-09-18.
- ^ "Object Storage Erasure Coding vs. Block Storage RAID". MinIO Blog. 2021-07-27. Retrieved 2024-09-18.
- ^ "Erasure coding vs Raid as a data protection method | Computer Weekly". ComputerWeekly.com. Retrieved 2024-09-18.
- ^ Kruth, Peter (2023-10-04). "Erasure Code: RAID As It Should Be – Huawei BLOG". Archived from the original on 2023-10-04. Retrieved 2024-09-18.
- ^ "Erasure Coding 101". MinIO Blog. 2022-04-25. Retrieved 2024-09-18.
- ^ Bhaskaran, Dinesh Kumar (July 6, 2018). "Why erasure coding is the future of data resiliency". Archived from the original on August 7, 2020.
- ^ a b c d e f Rashmi Vinayak. "Erasure Coding for Big-data Systems: Theory and Practice". 2016. p. 2: section "Abstract". p. 9: section "Systematic codes". p. 12: section "Regenerating codes".
- ^ "Erasure Encoding—Practice and Principles". 2016.
- ^ Matt Sarrel. "Erasure Coding 101". 2022.
- ^ a b Brian Beach. "Backblaze Open-sources Reed-Solomon Erasure Coding Source Code". 2015.
- ^ Xia, Mingyuan; Saxena, Mohit; Blaum, Mario; Pease, David A. (2015). A Tale of Two Erasure Codes in HDFS. FAST '15. pp. 213–226. ISBN 978-1-931971-20-1.
- ^ Leventhal, Adam (December 2009). "Triple-Parity RAID and Beyond: As hard-drive capacities continue to outpace their throughput, the time has come for a new level of RAID". ACM Queue. 7 (11): 30–39. doi:10.1145/1661785.1670144.
- ^ Dimakis, Alexandros G.; Godfrey, P. Brighten; Wu, Yunnan; Wainwright, Martin J.; Ramchandran, Kannan (September 2010). "Network Coding for Distributed Storage Systems". IEEE Transactions on Information Theory. 56 (9): 4539–4551. arXiv:cs/0702015. Bibcode:2010ITIT...56.4539D. CiteSeerX 10.1.1.117.6892. doi:10.1109/TIT.2010.2054295. S2CID 260559901.
- ^ "home [Erasure Coding for Distributed Storage Wiki]". 2017-07-31. Archived from the original on 2017-07-31. Retrieved 2023-08-20.
External links
[edit]- Jerasure is a Free Software library implementing Reed-Solomon and Cauchy erasure code techniques with SIMD optimisations.
- Software FEC in computer communications by Luigi Rizzo describes optimal erasure correction codes
- Feclib is a near optimal extension to Luigi Rizzo's work that uses band matrices. Many parameters can be set, like the size of the width of the band and size of the finite field. It also successfully exploits the large register size of modern CPUs. How it compares to the near optimal codes mentioned above is unknown.
- Coding for Distributed Storage wiki for regenerating codes and rebuilding erasure codes.
- ECIP "Erasure Code Internet Protocol" Developed in 1996, was the first use of FEC "Forward Error correction" on the Internet. It was first used commercially to *stream live video of Sir Arthur C. Clarke in Sri Lanka to UIUC in Indiana.
Erasure code
View on GrokipediaFundamentals
Definition and Principles
Erasure codes are a subclass of error-correcting codes designed specifically to protect against data erasures, where the positions of lost data are known, enabling the recovery of the original data from a sufficient subset of the remaining encoded fragments.[3] In this framework, original data symbols are encoded into total symbols, with , such that any out of the encoded symbols can reconstruct the original data exactly.[3] The redundancy is provided by additional symbols, which allow the system to tolerate up to erasures without data loss.[3] Unlike general error-correcting codes, which must detect and correct arbitrary errors in unknown positions within noisy channels, erasure codes exploit the knowledge of erasure locations to simplify decoding and achieve higher efficiency in recovery.[3] This distinction makes erasure decoding computationally easier, as it involves solving a system of equations from the known surviving symbols rather than searching for error patterns.[3] A fundamental limit on erasure code performance is given by the Singleton bound, which states that the minimum distance of an code satisfies , with codes achieving equality known as maximum distance separable (MDS) codes.[4] Erasure codes offer significant storage efficiency compared to replication methods, requiring less overhead while providing comparable fault tolerance; for instance, a code incurs only a 1.2× storage multiplier versus 2× for simple mirroring.[5] This efficiency stems from distributing redundancy across multiple fragments, allowing recovery from any survivors even after failures.[5] A simple illustrative example is a parity code over GF(2, where two data blocks and are encoded into three blocks with the third being the parity ; the original data can then be recovered from any two of the three blocks, such as or .[3]Mathematical Foundations
Erasure codes are constructed as linear codes over finite fields, also known as Galois fields GF(q), where q is typically a power of 2 for computational efficiency in data storage applications. In such fields, addition is performed using bitwise XOR operations, which correspond to vector addition modulo 2. Multiplication is more involved and is defined relative to an irreducible primitive polynomial of degree b for GF(2^b); for byte-level coding, GF(256) or GF(2^8) is commonly used, with multiplication implemented via precomputed lookup tables for logarithms and antilogarithms to enable fast exponentiation-based computation. Linear erasure codes are defined by a generator matrix G, which is a k × n matrix over GF(q), where k is the dimension (number of information symbols) and n is the block length (total symbols including redundancy). A codeword is formed by multiplying a k-dimensional message vector m by G, yielding the n-dimensional codeword c = m G. In systematic form, G is structured as [I_k | P], where I_k is the k × k identity matrix and P is a k × (n - k) parity submatrix; this allows the encoded block to be E = [D | P_D], with D the k data symbols and P_D = D P the (n - k) parity symbols. The parity-check matrix H is an (n - k) × n matrix satisfying G H^T = 0, used in decoding to compute the syndrome s = r H^T for a received vector r, where non-zero syndromes indicate erasures or errors. A fundamental limit on the performance of block codes is the Singleton bound, which states that for a code of block length n, dimension k, and minimum distance d over an alphabet of size q, the distance satisfies d ≤ n - k + 1.[6] Codes achieving equality in this bound are known as maximum distance separable (MDS) codes, which provide the optimal trade-off between redundancy and error/erasure correction capability for a given rate.[6] The efficiency of an erasure code is characterized by its rate R = k/n, which represents the fraction of information symbols in the encoded block, and the corresponding storage overhead factor of 1/R = n/k, quantifying the redundancy introduced. Decoding erasure codes involves solving a linear system derived from the known symbol positions using the parity-check matrix or generator matrix. For up to (n - k) erasures in a systematic code, this reduces to inverting a k × k submatrix via Gaussian elimination over the finite field, with computational complexity O(k^3) in the number of arithmetic operations.Historical Development
Early Innovations
The concept of parity checks emerged in the 1950s as a fundamental mechanism for achieving simple redundancy in early computing systems, enabling basic error detection and correction.[7] These checks involved adding redundant bits to data to verify integrity, laying groundwork for erasure coding principles by allowing recovery from single failures through parity computations. In 1950, Richard Hamming formalized error-correcting codes in his seminal work on parity-check-based schemes, initially focused on detecting and correcting transmission errors in computing machines, which influenced later erasure-tolerant designs.[8] A significant practical application arrived in 1988 with the introduction of Redundant Arrays of Inexpensive Disks (RAID) by David A. Patterson, Garth Gibson, and Randy H. Katz, who proposed using XOR-based parity schemes for disk arrays to tolerate disk failures.[9] RAID levels 5 and 6 specifically employed these erasure codes, distributing parity information across drives to recover from one or two simultaneous failures, respectively, thereby enhancing reliability in storage systems without excessive redundancy.[9] In 1960, Irving S. Reed and Gustave Solomon developed Reed-Solomon codes, a class of non-binary cyclic error-correcting codes based on polynomial evaluations over finite fields, capable of correcting multiple symbol erasures or errors.[10] These codes found early real-world use in deep-space communication, notably in NASA's Voyager missions launched in 1977, where a concatenated (255,223) Reed-Solomon code over GF(256) was implemented to protect telemetry data against channel noise and erasures during transmission from billions of kilometers away.[11] During the 1990s, erasure codes began appearing in academic prototypes for distributed systems, exploring fault tolerance beyond traditional replication. The OceanStore project, initiated in 2000 by John Kubiatowicz and colleagues, integrated Byzantine fault-tolerant erasure coding into its global-scale persistent storage architecture, using techniques like Reed-Solomon variants to ensure data durability across untrusted, wide-area networks while minimizing storage overhead.[12]Modern Advancements
In the 2000s, academic research increasingly applied erasure codes to distributed storage systems, emphasizing scalability for wide-area networks. The Total Recall system, introduced in 2004, used erasure codes to store large files across peer-to-peer nodes, automating availability management by detecting and repairing failures through redundant fragments. Similarly, DHash++, developed in 2006, integrated erasure coding into a distributed hash table architecture to enhance lookup efficiency and fault tolerance in wide-area environments, reducing the overhead of traditional replication.[13] Production adoption accelerated in the 2010s as large-scale systems sought cost savings over replication. Google's transition from the Google File System to Colossus around 2010 incorporated Reed-Solomon codes, such as the RS(6,3) configuration, to achieve efficient redundancy with a storage overhead of 1.5x while supporting high availability across clusters.[14] In parallel, Apache Hadoop's HDFS RAID, released in 2012, applied erasure codes to convert replicated cold data into parity-protected stripes, reducing storage requirements by up to 50% without compromising durability.[15] Efficiency breakthroughs addressed repair bottlenecks in erasure-coded systems. Minimum Storage Regenerating (MSR) codes, proposed by Dimakis et al. in 2007, minimized both per-node storage and repair bandwidth by leveraging network coding principles, enabling optimal trade-offs for distributed repair scenarios. Building on this, Locally Repairable Codes (LRCs), introduced in 2012 for Microsoft Azure Storage, localized single-failure repairs to a small subset of nodes, reducing repair traffic by up to 75% compared to standard Reed-Solomon codes while maintaining low storage overhead.[16] By the 2020s, erasure codes had become integral to cloud infrastructure. Microsoft Azure continued deploying LRCs for blob storage, achieving multi-rack fault tolerance with repair times under 10% of global reconstruction costs.[17] Recent advancements include maximally recoverable LRCs, which enhance resilience in multi-failure scenarios and have been further deployed in Azure as of 2022.[18] A survey of 285 papers from 2002 to 2023 highlights the field's maturity, with 76.1% addressing code constructions, algorithmic optimizations, and adaptations for emerging architectures like non-volatile memory.[13] Standardization efforts facilitated broader deployment. The IETF's Forward Error Correction Framework (RFC 6363, 2011) and Reed-Solomon specifications (RFC 5510, 2009) provided protocols for erasure codes in IP networks, influencing storage implementations.[19] OpenStack Swift integrated erasure code support from the 2014 Juno release onward, enabling policy-based use of codes like Reed-Solomon for object storage durability.[20]Optimal Erasure Codes
Parity Check Codes
Parity check codes represent a foundational class of optimal erasure codes constructed using linear operations over the binary field GF(2), where addition corresponds to the exclusive-OR (XOR) function. These codes generate m parity symbols from k data symbols to form an n = k + m symbol codeword, achieving maximum distance separable (MDS) properties for small m by ensuring that any k symbols suffice for data recovery. The encoding process relies on a parity-check matrix H of dimensions m × n, where the code satisfies H \mathbf{c} = \mathbf{0} over GF(2), with \mathbf{c} the systematic codeword vector containing data followed by parities. In simple constructions, each row of H places a 1 in the corresponding parity column and 1's across specific data columns to define the linear dependencies, enabling efficient XOR-based computation while maintaining invertibility of relevant submatrices for MDS guarantees.[21] For the basic case of m=1, the construction yields a single even parity symbol P computed as the XOR of all k data symbols d_1, \dots, d_k: This corresponds to the parity-check matrix row [1 , 1 , \dots , 1 , | , 1], where the final 1 enforces the even parity condition. Such codes, exemplified by RAID-5 arrays, tolerate one erasure by recomputing the lost symbol via XOR of the remaining n-1 symbols, making them computationally lightweight with O(n) operations for encoding and decoding.[9] A prominent extension to m=2 appears in the EVENODD code, which provides an MDS construction tolerating two erasures with minimal redundancy. Designed for k data symbols where k is an odd prime, EVENODD arranges the information symbols into a (k-1) × k array and computes two parity columns: the first (horizontal parity column) as the row-wise XOR of all k data symbols in each row, and the second (diagonal parity column) along wrapped diagonals of the array, with an adjustment for even/odd parity on a special diagonal to ensure systematic form and MDS distance 3. Encoding requires O(k^2) XORs, but leverages RAID-5 hardware for efficiency. Decoding for single or double erasures uses syndrome equations solved via recursive XORs on available symbols, recovering lost data in O(k) to O(k^2) time depending on failure pattern. This achieves optimality per the Singleton bound, using only two parities for distance 3, outperforming general Reed-Solomon codes in XOR count for small fields (e.g., 50% fewer operations for n=15).[22] These codes exhibit MDS properties for small m, such as m=1 in RAID-5 or m=2 in EVENODD, allowing recovery from any m erasures through simple XOR combinations of surviving symbols without interpolation. However, a key limitation is high repair bandwidth: repairing one erased symbol requires downloading approximately k symbols from other nodes, as the linear dependencies necessitate accessing nearly the full data set. Additionally, the quadratic computational complexity in k for encoding, decoding, and updates renders them non-scalable for large n in distributed systems, where prime constraints on k further restrict flexibility.[23] In practice, parity check codes underpin RAID-6 deployments, which employ two parities for double-failure tolerance and optimize computations using Cauchy matrices to minimize XOR operations (e.g., reducing encoding to O(k) via sparse, invertible structures over GF(2^b)).[24]Reed-Solomon Codes
Reed-Solomon codes are a class of optimal maximum distance separable (MDS) erasure codes that achieve the singleton bound, allowing recovery of the original data from any k out of n encoded symbols, where n - k is the number of parity symbols.[10] These codes operate over finite fields, typically Galois fields GF(q), and are constructed using polynomial interpolation to ensure the MDS property.[10] In the construction of a Reed-Solomon code RS(n, k), the k data symbols are treated as the coefficients of a polynomial f(x) of degree less than k over GF(q), where q ≥ n to provide n distinct evaluation points α_i.[10] The encoded symbols consist of the n evaluations of f(x) at these distinct points α_1, α_2, ..., α_n in the field, with the first k evaluations representing the systematic data and the remaining n - k serving as parities.[10] This polynomial-based approach ensures that any n - k erasures can be tolerated, as the minimum distance d = n - k + 1 matches the MDS requirement.[10] The encoding formula for each symbol is given by c_i = f(α_i) for i = 1 to n, where f(x) = d_0 + d_1 x + ... + d_{k-1} x^{k-1} and the d_j are the data symbols.[10] This evaluation can be computed directly using Horner's method for efficiency, resulting in an overall encoding complexity of O(n k) field operations.[2] Decoding in the erasure-only case involves recovering f(x) from any k of the received symbols using polynomial interpolation.[10] Lagrange interpolation provides a direct method to reconstruct the polynomial from the k points, while the Berlekamp-Massey algorithm offers an efficient alternative by solving for the shortest linear feedback shift register that generates the received sequence, enabling recovery in O(k^2) time.[25] These methods handle up to n - k erasures by interpolating the unique polynomial of degree less than k that passes through the available points.[2] A simple example is the (3,2) Reed-Solomon code, often illustrated for basic error-resilient messaging like "Err-mail." Consider data symbols d_0 and d_1 in GF(3), forming f(x) = d_0 + d_1 x; evaluate at α_1 = 0, α_2 = 1, α_3 = 2 to get c_1 = d_0, c_2 = d_0 + d_1, c_3 = d_0 + 2 d_1. If c_3 is erased, interpolation from c_1 and c_2 recovers f(x) and thus the data.[2] Reed-Solomon codes exhibit O(n k) encoding and O(k^2) decoding complexity in their standard forms, making them suitable for practical implementations.[2] They have been widely adopted, such as in compact discs (CDs) for error correction since the 1980s using cross-interleaved (32,28) and (28,24) codes over GF(2^8),[26] and in QR codes for data recovery with varying error correction levels based on Reed-Solomon parities.[27] A key limitation is the field size constraint, requiring q ≥ n, which bounds the code length to the field order and necessitates sub-packetization—dividing data into smaller symbols—for large-scale applications where n exceeds practical q values like 2^8 or 2^16.Near-Optimal Erasure Codes
Locally Repairable Codes
Locally repairable codes (LRCs) are a class of erasure codes designed to minimize the amount of data read during the repair of a single failed node in distributed storage systems, achieving near-optimal tradeoffs between storage efficiency and repair bandwidth. Unlike maximum distance separable (MDS) codes, LRCs relax the strict MDS property to enable local repairability, where each data symbol can be recovered using a small, fixed number r of other symbols within a local parity group. The construction typically divides the n coded symbols into g local groups, each consisting of r data symbols plus one local parity symbol, resulting in g local parities. Additional h global parities are then computed across all data and local parities, with the total redundancy m = g + h - 1, as the sum of local parities serves as an implicit global parity to avoid redundancy. This structure ensures locality r for repairing any single erasure using only r other symbols from its local group. A representative example is a (10, 6, 3) LRC, featuring two local groups each of size 4 (three data symbols and one local parity computed via XOR or Reed-Solomon within the group) for a total of six data symbols and two local parities, augmented by two global parities computed over the entire set. In this setup, repairing a single failed data symbol requires accessing only three other symbols from its local group, significantly reducing I/O compared to reading from k systematic symbols in an MDS code. The minimum distance d satisfies the near-MDS bound d = n - k + 1 - (g - 1) = 10 - 6 + 1 - (2 - 1) = 4, allowing correction of up to three erasures globally while maintaining local repair efficiency; repair operations scale as O(r) in bandwidth, versus O(k) for traditional Reed-Solomon codes.[30] Encoding in LRCs involves first computing local parities independently for each group, often using simple XOR for binary fields or higher-order Reed-Solomon codes for enhanced local distance, followed by global parities derived from a systematic overall parity-check matrix that incorporates both data and local parities. This hierarchical approach ensures the code remains linear and efficient for encoding/decoding. Tradeoffs include slightly higher storage overhead, such as 1.5× replication factor versus 1.33× for an equivalent MDS code tolerating three failures, in exchange for 2-3× faster single-node repairs by limiting reads to the local group. Microsoft Azure Storage deploys an LRC with parameters (16, 12, 2), using 12 data symbols, two local parities (g=2 groups of six data each plus locals), and two globals (h=2), achieving a 1.33× overhead while halving repair bandwidth over Reed-Solomon baselines.[16]Regenerating Codes
In distributed storage systems, repairing a failed node using traditional erasure codes like Reed–Solomon requires downloading a full node's worth of data from surviving nodes, amounting to symbols in total, which incurs significant network bandwidth costs.[31] Regenerating codes address this by enabling a new node to download a reduced amount symbols from helper nodes (), where is the total number of nodes, while ensuring any nodes can reconstruct the original file of size .[31] This trade-off between storage per node and repair bandwidth (with symbols from each helper) minimizes communication overhead in failure-prone environments like cloud storage.[32] The cut-set bound, derived from network coding principles, establishes the theoretical minimum for repair bandwidth: symbol size, assuming the file is divided into symbols and helpers contribute equally.[31] This bound arises from information flow graph analysis, where the minimum cut across nodes must support file reconstruction, and repair flows from helpers must suffice for recovery without exceeding storage constraints.[32] Codes achieving this bound are optimal, balancing and along the storage-bandwidth trade-off curve.[33] Minimum Storage Regenerating (MSR) codes achieve the cut-set bound at the minimum storage point, where (matching MDS codes like Reed–Solomon for space efficiency) and .[33] Exact-regenerating MSR constructions recover the exact lost data using interference alignment techniques, where helper contributions are linearly combined to cancel interference and isolate desired symbols.[34] These codes maintain MDS properties for reconstruction while reducing repair traffic, with explicit constructions available for parameters where .[33] Minimum Bandwidth Regenerating (MBR) codes prioritize bandwidth minimization at , at the cost of higher storage .[32] This point on the trade-off curve allows simpler encoding but increases per-node capacity, making MBR suitable for bandwidth-constrained networks.[33] Like MSR, MBR codes can be exactly repaired using interference alignment for feasible .[34] A representative example is the (4,2) MSR code with helpers and sub-packetization, where the file is split into smaller sub-packets to enable exact repair; each helper contributes symbols, yielding total symbols—less than the 4 symbols required by traditional MDS repair—while .[33] This construction uses product-matrix methods to align interferences across sub-packets, demonstrating bandwidth savings in small-scale systems.[34] As of 2025, extensions like piggybacking codes, introduced in 2013, enable lazy repair by precomputing functions of data during initial writes, further reducing effective bandwidth in MSR setups without altering core parameters.[35] These codes remain primarily in research prototypes, such as experimental distributed file systems, due to sub-packetization complexities limiting practical deployment.[13] Regenerating codes differ from locally repairable codes by focusing on global bandwidth reduction across helpers rather than locality for fast single-node fixes.[32]Applications
Distributed Storage Systems
Erasure codes have been integrated into the Hadoop Distributed File System (HDFS) since the release of Apache Hadoop 3.0 in 2017, with initial proposals and development efforts beginning around 2015 through collaborations between Intel and Cloudera.[36] Note: Erasure code parameters are denoted as RS(n,k), where n is the total number of symbols and k is the number of data symbols. HDFS supports configurable erasure coding policies, such as Reed-Solomon (RS) codes with parameters like RS(9,6) or RS(14,10), which replace triple replication for certain data sets, particularly cold or archival data.[37] For example, an RS(14,10) configuration encodes 10 data blocks into 14 total blocks (10 data + 4 parity), enabling automatic tiering where frequently accessed hot data remains replicated while less-accessed data uses erasure coding to optimize space.[38] This integration allows administrators to apply policies at the directory level, balancing durability with storage efficiency without altering core HDFS workflows. In cloud storage systems, erasure codes are prominently deployed in Google's Colossus file system and Microsoft Azure Blob Storage. Google's Colossus, the successor to the Google File System (GFS), employs RS(9,6) codes, which provide a redundancy factor of 1.5 while tolerating up to 3 simultaneous failures across distributed clusters.[13] Similarly, Microsoft Azure utilizes locally repairable codes (LRCs), such as LRC(12,2,2) with 12 data fragments, 2 local parities, and 2 global parities (total 16 fragments), to enhance repair efficiency in large-scale object storage.[16] These LRCs, which build on principles from regenerating codes, reduce the data read during single-fragment repairs from 12 blocks (as in standard RS codes) to as few as 3, achieving approximately 50% lower repair times compared to equivalent RS configurations.[16] The primary benefits of erasure codes in distributed storage include significant cost savings through improved storage efficiency—up to 3 times better than traditional 3-way replication for low-overhead schemes like RS(13,10)—and robust fault tolerance against 10-20% node failures in large clusters, depending on parity configuration.[13] For instance, in hyperscale environments, this translates to handling petabyte-scale data with minimal overhead, reducing hardware requirements and operational expenses while maintaining high availability. However, challenges persist, including encoding and decoding latency, which can increase read/write times over replication due to computational overhead; these are often mitigated using hardware accelerators like Intel's ISA-L library.[39] Additionally, straggler effects during distributed repairs—where slow nodes delay overall recovery—can extend mean time to repair (MTTR) in heterogeneous clusters, though techniques like partial parallel decoding help reduce this by up to 59%.[40] As of 2025, erasure codes are widely adopted in hyperscale distributed storage systems, powering platforms like HDFS, Colossus, and Azure for zettabyte-scale deployments.[13] Open-source libraries such as OpenEC facilitate this adoption by providing a unified framework for implementing and configuring various erasure codes, including RS and LRC, directly into systems like Ceph and Swift, enabling customizable integration without proprietary dependencies.[41]Networking and Communication
Erasure codes play a crucial role in networking and communication by enabling reliable data transmission over unreliable channels, where packet losses occur due to congestion, interference, or other impairments. In such scenarios, packets are encoded prior to transmission to include redundancy, allowing receivers to recover lost data without retransmission requests, which is particularly beneficial in multicast or broadcast settings. For instance, Tornado codes, introduced in 1998, were specifically developed for efficient reliable multicast of large files, achieving near-linear time encoding and decoding complexity while outperforming traditional Reed-Solomon codes in bandwidth efficiency for erasure channels.[42] A key advancement in this domain is the development of fountain codes, which are rateless erasure codes capable of generating an unlimited number of parity packets from a fixed set of source symbols. Luby Transform (LT) codes, proposed in 2002, form the foundational class of practical fountain codes, employing a robust degree distribution to ensure efficient performance; the encoding process involves randomly selecting source symbols based on this distribution to create output symbols. Building on LT codes, Raptor codes, introduced in 2006, incorporate a low-density parity-check precode followed by an LT-like fountain stage, enabling linear-time encoding and decoding complexity, which significantly reduces computational overhead for large-scale transmissions. As an illustrative example, for an LT code with 1000 source symbols, a receiver typically collects around 1050 encoded packets—incurring a modest 5% overhead—to achieve successful decoding with high probability using the belief propagation algorithm, which iteratively resolves symbols starting from degree-one outputs.[43][43] These codes find widespread application in video streaming and satellite communications, where packet loss is common and retransmission-based automatic repeat request (ARQ) protocols introduce unacceptable delays. In video streaming, fountain codes protect against erasures in real-time delivery, maintaining playback quality without buffering interruptions; for example, they have been integrated into protocols for wireless multicast scenarios to minimize retransmissions. In satellite links, characterized by high latency and variable loss rates, erasure codes enhance throughput by correcting erasures proactively, with empirical studies demonstrating latency reductions of 20-30% compared to ARQ approaches in lossy environments. As of 2025, erasure codes are increasingly integrated into 5G and 6G networks to support ultra-reliable low-latency communication (URLLC) services, such as industrial automation and autonomous vehicles, by providing robust FEC without feedback overhead. The IETF's FECFRAME framework further standardizes these mechanisms, defining architectures for applying FEC, including fountain-based schemes, across IP multicast and unicast flows to ensure interoperability and efficiency.[44][45][19]Emerging Uses
Advanced Storage Architectures
In flash and solid-state drive (SSD) storage, erasure codes are adapted to address the unique constraints of multilevel cells, such as limited write cycles and wear leveling. Write-Once Memory (WOM) codes enable multiple writes to flash cells without erasure by leveraging the asymmetric programming property, where cells can only increase in charge level. These codes are particularly suited for multilevel cells (MLC) in NAND flash, allowing data to be rewritten up to t times per block while minimizing wear. For instance, rank modulation schemes represent data as permutations of cell charge levels rather than absolute values, facilitating rewriting without full block erasures and integrating with erasure-correcting mechanisms to tolerate erasures and deletions caused by wear or errors. This approach extends flash lifetime by reducing erasure frequency, with rank-modulation rewriting codes achieving close to 2 bits per cell per write while preserving reliability.[46][47] In in-memory storage systems using volatile RAM clusters, erasure codes provide fault tolerance with lower overhead than traditional replication. EC-Cache, an online erasure coding scheme for cluster caching, splits objects into k data and r parity units during writes, enabling reconstruction from any k units. It employs (10,6) codes, which store data across 16 units with 1.6× storage overhead, reducing replication overhead by over 3× compared to selective 3-replication while using similar memory. By reading k+Δ units (e.g., Δ=1) to handle stragglers, EC-Cache improves load balancing by more than 3× and reduces median read latency by up to 2.64× and tail latency (99.9th percentile) by 1.79× for large objects like 40 MB, scaling effectively in RAM-based clusters.[48] Disaggregated storage architectures separate compute and memory resources into pooled systems, where erasure codes enhance efficiency in memory pooling. MicroEC optimizes erasure coding for disaggregated memory by reusing auxiliary coding data and pipelining encoding with RDMA transmissions via an exponential pipeline, supporting operations like writes, reads, degraded reads, and recoveries. This reduces write latency by up to 44.35% compared to baseline erasure coding and 42.14% versus 3-way replication, while improving write throughput by 1.80× over erasure coding and 1.73× over replication for objects of 1 MB or larger. Complementing this, Hydra applies online erasure coding to individual remote memory pages in disaggregated setups, achieving single-digit microsecond access latencies for fault-tolerant remote memory without full replication. It uses configurable coding groups placed via the CodingSets algorithm for balanced load and high availability, tolerating node failures in pooled environments.[49][50] Optimizations in these architectures further mitigate traditional erasure code limitations. Stripeless placement schemes, such as Nos, eliminate stripe alignment by having nodes independently replicate and XOR-encode data into parities based on symmetric balanced incomplete block designs (SBIBD), avoiding centralized coordination and stripe-induced overheads during writes. This enables fault tolerance for up to p node failures while simplifying placement in in-memory systems. Evaluations show Nos-based systems achieving 1.61× to 2.60× higher throughput than stripe-based baselines, with comparable or lower latencies. Additionally, hardware accelerations like GPUs speed up encoding; for example, G-CRS uses GPU parallelism for Cauchy Reed-Solomon codes, delivering up to 10× higher throughput than CPU-based libraries for large-scale operations.[51][52] These advancements enable erasure codes to support scalability to petabyte-scale nodes in advanced architectures, such as those using NVMe-over-Fabrics (NVMe-oF) for high-speed interconnects. In NVMe-oF fabrics, erasure codes provide robust fault tolerance against multiple node or drive failures, as seen in systems like IBM Spectrum Scale Erasure Code Edition, which uses Reed-Solomon codes for declustered protection across petabyte clusters with 1-3 node tolerance depending on configuration (e.g., 8+3 parity). This ensures data availability in disaggregated, high-density environments without excessive replication overhead.[53]Integration with AI and DNA Storage
Erasure codes have been integrated into distributed machine learning frameworks to enhance resilience during training processes, particularly by mitigating the impact of stragglers—slow or failed compute nodes—in gradient descent computations. In coded computing approaches, data and computations are encoded such that the results from a subset of nodes suffice for full recovery, tolerating up to a specified number of stragglers without halting the entire process; for instance, nested gradient codes allow flexible straggler mitigation by concatenating schemes for varying failure counts, reducing training time in large-scale setups. Recent advancements, such as ECRec for deep learning recommendation models, couple erasure coding with model-specific traits to achieve fault tolerance while minimizing overhead in distributed training environments. Additionally, machine learning models predict file access patterns or failure probabilities to enable adaptive erasure coding, as demonstrated in Azure Storage systems where deep learning optimizes code parameters for dynamic workloads, improving efficiency in AI-driven data management. In DNA storage, erasure codes address inherent biological errors during synthesis and sequencing, including 6.2% deletion rates and 5.7% insertion rates observed in photolithographic methods, by encoding data into redundant oligonucleotides that allow reconstruction from partial reads. Fountain codes, adapted for DNA's constraints like variable oligo recovery and biochemical limits, enable rateless encoding where sufficient fragments—regardless of dropouts—decode the original information, as pioneered in the DNA Fountain architecture that stores gigabytes scalably. This integration supports ultra-high densities, with theoretical capacities reaching up to 33 zettabytes in volumes comparable to a ping-pong ball, far surpassing traditional media and positioning DNA as a viable archival solution for exascale data. An example is the ELECT system, which applies erasure coding tiering in edge-cloud LSM-tree stores to balance redundancy and performance, extendable to bio-hybrid scenarios for reliable DNA data handling. As of 2025, trends emphasize hybrid erasure codes combining replication and coding for mixed failure modes in AI and DNA contexts, as surveyed in comprehensive reviews of storage systems, with applications in adaptive AI training and error-prone bio-storage. The market for erasure coding technologies is projected to experience significant growth by 2031, fueled by these interdisciplinary integrations in resilient computing and archival technologies. Challenges persist, including high decoding error rates in DNA due to indel dominance, necessitating advanced codes like HEDGES for direct correction, and computational overhead in AI distributed training, where encoding/decoding adds latency that coded schemes aim to offset through optimized designs.References
- https://ntrs.[nasa](/page/NASA).gov/api/citations/19900019023/downloads/19900019023.pdf
- https://arxiv.org/abs/1206.3804


