Recent from talks
Nothing was collected or created yet.
Run-length limited
View on WikipediaThis article needs additional citations for verification. (June 2023) |
Run-length limited (RLL) is a line coding technique that is used to send arbitrary data over a communications channel with bandwidth limits. RLL codes are defined by four main parameters: m, n, d, k. The first two, m/n, refer to the rate of the code, while the remaining two specify the minimal d and maximal k number of zeroes between consecutive ones. This is used in both telecommunication and storage systems that move a medium past a fixed recording head.[1]
Specifically, RLL bounds the length of stretches (runs) of repeated bits during which the signal does not change. If the runs are too long, clock recovery is difficult; if they are too short, the high frequencies might be attenuated by the communications channel. By modulating the data, RLL reduces the timing uncertainty in decoding the stored data, which would lead to the possible erroneous insertion or removal of bits when reading the data back. This mechanism ensures that the boundaries between bits can always be accurately found (preventing bit slip), while efficiently using the media to reliably store the maximal amount of data in a given space.
Early disk drives used very simple encoding schemes, such as RLL (0,1) FM code, followed by RLL (1,3) MFM code, which were widely used in hard disk drives until the mid-1980s and are still used in digital optical discs such as CD, DVD, MD, Hi-MD and Blu-ray. Higher-density RLL (2,7) and RLL (1,7) codes became the de facto industry standard for hard disks by the early 1990s.
Need for RLL coding
[edit]On a hard disk drive, information is represented by changes in the direction of the magnetic field on the disk, and on magnetic media, the playback output is proportional to the density of flux transition. In a computer, information is represented by the voltage on a wire. No voltage on the wire in relation to a defined ground level would be a binary zero, and a positive voltage on the wire in relation to ground represents a binary one. Magnetic media, on the other hand, always carries a magnetic flux – either a "north" pole or a "south" pole. In order to convert the magnetic fields to binary data, some encoding method must be used to translate between the two.
One of the simplest practical codes, modified non-return-to-zero-inverted (NRZI), simply encodes a 1 as a magnetic polarity transition, also known as a "flux reversal", and a zero as no transition. With the disk spinning at a constant rate, each bit is given an equal time period, a "data window", for the magnetic signal that represents that bit, and the flux reversal, if any, occurs at the start of this window. (Note: older hard disks used one fixed length of time as the data window over the whole disk, but modern disks are more complicated; for more on this, see zoned bit recording.)
This method is not quite that simple, as the playback output is proportional to the density of ones, a long run of zeros means no playback output at all.
In a simple example, consider the binary pattern 101 with a data window of 1 ns (one nanosecond, or one billionth of a second). This will be stored on the disk as a change, followed by no change, and then another change. If the preceding magnetic polarity was already positive, the resulting pattern might look like this: −−+. A value of 255, or all binary ones, would be written as −+−+−+−+ or +−+−+−+−. A zero byte would be written as ++++++++ or −−−−−−−−. A 512-byte sector of zeros would be written as 4096 sequential bits with the same polarity.
Since a disk drive is a physical piece of hardware, the rotational speed of the drive can change slightly, due to a change in the motor speed or thermal expansion of the disk platter. The physical media on a floppy disk can also become deformed, causing larger timing errors, and the timing circuit on the controller itself may have small variations in speed. The problem is that, with a long string of zeros, there's no way for the disk drive's controller to know the exact position of the read head, and thus no way to know exactly how many zeros there are. A speed variation of even 0.1%, which is more precise than any practical floppy drive, could result in 4 bits being added to or removed from the 4096-bit data stream. Without some form of synchronization and error correction, the data would become completely unusable.
The other problem is due to the limits of magnetic media itself: it is only possible to write so many polarity changes in a certain amount of space, so there's an upper limit to how many ones can also be written sequentially, this depends on the linear velocity and the head gap.
To prevent this problem, data is coded in such a way that long repetitions of a single binary value do not occur. By limiting the number of zeros written consecutively to some maximum k, this makes it possible for the drive controller to stay synchronized. By limiting the number of zeros written in a row to some minimum d between each and every one, the overall frequency of polarity changes is reduced, allowing the drive to store more data in the same amount of space, resulting in either a smaller package for the same amount of data or more storage in the same size package.
History
[edit]
All codes used to record on magnetic disks have limited the length of transition-free runs and can therefore be technically characterized as RLL codes. The earliest and simplest variants were given specific names, such as modified frequency modulation (MFM), and the name "RLL" is commonly used only for the more complex variants not given such specific names, but the term technically applies to them all.
Outside of this simplest version, the first RLL code used in hard drives was RLL (2,7), developed by IBM engineers and first used commercially in 1979 on the IBM 3370 DASD,[2][3][4] for use with the 4300 series mainframe. During the late 1980s, PC hard disks began using RLL proper (i.e. variants more complex than those that had received their own proper names, such as MFM). RLL codes have found almost universal application in optical-disc recording practice since 1980. In consumer electronics, RLLs like the EFM code (rate = 8/17, d = 2, k = 10) are employed in the Compact Disc (CD) and MiniDisc (MD), and the EFMPlus code (rate = 8/16, d = 2, k = 10) used in the DVD. Parameters d and k are the minimal and maximal allowed run lengths. For more coverage on the storage technologies, the references cited in this article are useful.[5][6]
Technical overview
[edit]Generally run length is the number of bits for which signal remains unchanged. A run length of 3 for bit 1, represents a sequence 111. For instance, the pattern of magnetic polarizations on the disk might be +−−−−++−−−++++++, with runs of length 1, 4, 2, 3, and 6. However, run-length limited coding terminology assumes NRZI encoding, so 1 bits indicate changes and 0 bits indicate the absence of change, the above sequence would be expressed as 11000101001000001, and only runs of zero bits are counted.
Somewhat confusingly, the run length is the number of zeros (0, 3, 1, 2 and 5 in the preceding) between adjacent ones, which is one less than the number of bit times the signal actually remains unchanged. Run-length limited sequences are characterized by two parameters, d and k, which stipulate the minimal and maximal zero-bit run length that can occur in the sequence. So RLL codes are generally specified as (d,k) RLL, e.g.: (1,3) RLL.
Coding
[edit]In the encoded format a "1" bit indicates a flux transition, while a "0" indicates that the magnetic field on the disk does not change for that time interval.
FM: (0,1) RLL
[edit]Generally, the term "RLL code" is used to refer to more elaborate encodings, but the original Frequency Modulation code, also called differential Manchester encoding, can be seen as a simple rate-1/2 RLL code. The added 1 bits are referred to as clock bits.
| Data | Encoded |
|---|---|
| 0 | 10 |
| 1 | 11 |
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0 Encoded: 1010111011111011101010111110 Clock: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
GCR: (0,2) RLL
[edit]By extending the maximal run length to 2 adjacent 0 bits, the data rate can be improved to 4/5. This is the original IBM group coded recording variant.
|
|
Where possible (11 out of 16 codes), the bit pattern abcd is encoded by prefixing it with the complement of a: aabcd. In the 5 cases where this would violate one of the rules (000d or ab00), a code beginning with 11 is substituted (11bea, where e = a ∨ d).
Example:
Data: 0010 1101 0001 1000 Encoded: 10010011011101111010
Note that to meet the definition of (0,2) RLL, it is not sufficient only that each 5-bit code contain no more than two consecutive zeros, but it is also necessary that any pair of 5-bit codes as a combined sequentially not contain more than two consecutive zeros. That is, there must not be more than two zeros between the last one bit in the first code and the first one bit in the second code, for any two arbitrarily chosen codes. This is required because for any RLL code, the run-length limits – 0 and 2 in this case – apply to the overall modulated bitstream, not just to the components of it that represent discrete sequences of plain data bits. (This rule must hold for any arbitrary pair of codes, without exception, because the input data may be any arbitrary sequence of bits.) The IBM GCR code above meets this condition, since the maximal run length of zeros at the beginning of any 5-bit code is one, and likewise the maximal run length at the end of any code is one, making a total run length of two at the junction between adjacent codes. (An example of the maximal run length occurring between codes can be seen in the example given above, where the code for the data "0010" ends with a zero and the code for the next data, "1101", begins with a zero, forming a run of two zeros at the junction of these two 5-bit codes.)
MFM: (1,3) RLL
[edit]Modified frequency modulation begins to get interesting, because its special properties allow its bits to be written to a magnetic medium with twice the density of an arbitrary bit stream. There is a limit to how close in time flux transitions can be for reading equipment to detect them, and that constrains how closely bits can be recorded on the medium: In the worst case, with an arbitrary bit stream, there are two consecutive ones, which produces two consecutive flux transitions in time, so bits must be spaced far enough apart that there would be sufficient time between those flux transitions for the reader to detect them. But this code imposes a constraint of d = 1, i.e. there is a minimum of one zero between each two ones. This means that in the worst case, flux transitions are two bit times apart, so the bits can be twice as close together as with the arbitrary bit stream without exceeding the reader's capabilities.
This doubled recording density compensates for the 1/2 coding rate of this code (it takes two bits to represent one bit of real information) and makes it equivalent to a rate-1 code.
The encoding is very similar to the FM encoding.
| Data | Encoded |
|---|---|
| 0 | x0 |
| 1 | 01 |
Where "x" is the complement of the stream's previously encoded bit.
Except for the clock bits not always being one, this is the same as the FM table, and that is how this code gets its name. The inserted clock bits are 0 except between two 0 data bits.
When combined with the previous n-1 bit, the resulting encoding table for each data bit n effectively becomes.
| Data (n) | Data (n-1) | Encoded (n) |
|---|---|---|
| 0 | 0 | 10 |
| 1 | 00 | |
| 1 | 0 | 01 |
| 1 | 01 |
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0 Encoded: x010010001010001001010010100 Clock: x 1 0 0 0 0 0 0 0 1 1 0 0 0
(1,7) RLL
[edit](1,7) RLL maps 2 bits of data onto 3 bits on the disk, and the encoding is done in 2- or 4-bit groups. The encoding rules are: (x, y) becomes (NOT x, x AND y, NOT y), except (x, 0, 0, y) becomes (NOT x, x AND y, NOT y, 0, 0, 0).[7] When encoding according to the table below, the longest (last in the table) match must be used; those are exceptions handling situations where applying the earlier rules would lead to a violation of the code constraints.
| Data | Encoded |
|---|---|
| 00 | 101 |
| 01 | 100 |
| 10 | 001 |
| 11 | 010 |
| 00 00 | 101 000 |
| 00 01 | 100 000 |
| 10 00 | 001 000 |
| 10 01 | 010 000 |
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0 Encoded: 101 001 010 100 100 000 001
(2,7) RLL
[edit](2,7) RLL is rate-1⁄2 code, mapping n bits of data onto 2n bits on the disk, like MFM, but because the minimal run length is 50% longer (3 bit times instead of 2), the bits can be written faster, achieving 50% higher effective data density. The encoding is done in 2-, 3- or 4-bit groups.
Western Digital WD5010A, WD5011A, WD50C12
| Data | (2,7) RLL encoded |
|---|---|
| 11 | 1000 |
| 10 | 0100 |
| 000 | 100100 |
| 010 | 000100 |
| 011 | 001000 |
| 0011 | 00001000 |
| 0010 | 00100100 |
Seagate ST11R, IBM
| Data | (2,7) RLL encoded |
|---|---|
| 11 | 1000 |
| 10 | 0100 |
| 000 | 000100 |
| 010 | 100100 |
| 011 | 001000 |
| 0011 | 00001000 |
| 0010 | 00100100 |
Perstor Systems ADRC
| Data | (2,7) RLL encoded |
|---|---|
| 11 | 1000 |
| 10 | 0100 |
| 000 | 100100 |
| 010 | 000100 |
| 001 | 001000 |
| 0111 | 00001000 |
| 0110 | 00100100 |
The encoded forms begin with at most 4, and end with at most 3 zero bits, giving the maximal run length of 7.
Example:
Data: 1 1 0 1 1 0 0 1 1 Encoded: 1000 001000 00001000
HHH(1,13)
[edit]The HHH(1,13) code is a rate-2/3 code developed by three IBM researchers (Hirt, Hassner, and Heise) for use in the 16 Mbit/s IrDA VFIR physical layer.[8] Unlike magnetic encoding, this is designed for an infrared transmitter, where a 0 bit represents "off" and a 1 bit represents "on". Because 1 bits consume more power to transmit, this is designed to limit the density of 1 bits to less than 50%. In particular, it is a (1,13|5) RLL code, where the final 5 indicates the additional constraint that there are at most 5 consecutive "10" bit pairs.
| Data | Encoded |
|---|---|
| 00 | 010 |
| 01 | 001 |
| 10 | 100 |
| 11 | 101 |
| 01 10 | 001 000 |
| 01 11 | 010 000 |
| 11 10 | 101 000 |
| 11 11 | 100 000 |
| 00 11 00 | 010 000 000 |
| 00 11 01 | 001 000 000 |
| 10 11 00 | 100 000 000 |
| 10 11 01 | 101 000 000 |
| 00 11 10 11 | 010 000 000 000 |
| 10 11 10 11 | 100 000 000 000 |
The first eight rows describe a standard (1,7)-RLL code. The additional six exceptions increase the maximal run of zeros to 13 (in the legal pattern 100 000 000 000 001, which represents 10 11 10 11, followed by 01), but limit the maximal average ones density to 1⁄3. The longest run of 1–0 pairs is 000 101 010 101 000.
This code limits the ones density to between 1⁄12 and 1⁄3, with an average of 25.8%.
Examples
[edit]For example, let us encode the bit sequence 10110010 with different encodings
| Encoding | Data | Encoded |
|---|---|---|
| RLL(0,1) | 10110010 | 1110111110101110 |
| RLL(0,2) | 1011 0010 | 01011 10010 |
| RLL(1,3) | 10110010 | 0100010100100100 |
| RLL(1,7) | 10 11 00 10 | 001 010 101 001 |
| RLL(2,7) | 10 11 0010 | 0100 1000 00100100 |
| HHH(1,13) | 10 11 00 10 | 100 000 000 100 |
Densities
[edit]Suppose a magnetic tape can contain up to 3200 flux reversals per inch. A modified frequency modulation, or (1,3) RLL encoding, stores each data bit as two bits on tape, but since there is guaranteed to be one 0 (no flux reversal) bit between any 1 (flux reversal) bits, then it is possible to store 6400 encoded bits per inch on the tape, or 3200 data bits per inch. A (1,7) RLL encoding can also store 6400 encoded bits per inch on the tape, but since it only takes 3 encoded bits to store 2 data bits, this is 4267 data bits per inch. A (2,7) RLL encoding takes 2 encoded bits to store each data bit, but since there is guaranteed to be two 0 bits between any 1 bits, then it is possible to store 9600 encoded bits per inch on the tape, or 4800 data bits per inch.
The flux-reversal densities on hard drives are significantly greater, but the same improvements in storage density are seen by using different encoding systems.
See also
[edit]- 8b/10b encoding
- Bit slip
- Eight-to-fourteen modulation and EFMplus are DC-free (2,10) RLL codes used on CDs and DVDs, respectively.
- Error correcting codes
- Line code
- Modulation
- Physical layer
- PRML
- Run-length encoding
- Self-synchronizing code and bit synchronization
- Source coding
References
[edit]- ^ Kees Schouhamer Immink (October 2022). "Innovation in Constrained Codes". IEEE Communications Magazine. 60 (10): 20–24. doi:10.1109/MCOM.002.2200249.
A constrained system is defined by a constrained set of 'good' or 'allowable' sequences to be recorded or transmitted. Constrained coding focuses on the analysis of constrained systems and the design of efficient encoders and decoders that transform arbitrary user sequences into constrained sequences.
- ^ A Quarter Century of Disk File Innovation, IBM Journal of Research and Development.
- ^ P. A. Franaszek (1972), “Run-Length-Limited Variable Length Coding with Error Propagation Limitation”, U.S. patent 3,689,899.
- ^ Five decades of disk drive industry firsts, DISK/TREND, Inc., publisher of market studies of the worldwide disk drive and data storage industries. web.archive.org.
- ^ Kees Schouhamer Immink (December 1990). "Runlength-Limited Sequences". Proceedings of the IEEE. 78 (11): 1745–1759. doi:10.1109/5.63306.
A detailed description is furnished of the limiting properties of runlength limited sequences.
- ^ Kees A. Schouhamer Immink (November 2004). Codes for Mass Data Storage Systems (Second fully revised ed.). Eindhoven, The Netherlands: Shannon Foundation Publishers. ISBN 90-74249-27-2. Retrieved 2015-08-23.
- ^ Mee, C. Denis; Daniel, Eric D. (1996). Magnetic Storage Handbook (2nd ed.). McGraw Hill. ISBN 0-07-041275-8.
- ^ Hirt, Walter; Hassner, Martin; Heise, Nyles (February 2001), "IrDA-VFIr (16 Mb/s): modulation code and system design", IEEE Personal Communications, 8 (1): 58–71, doi:10.1109/98.904900.
External links
[edit]Run-length limited
View on GrokipediaIntroduction and Motivation
Definition and Purpose
Run-length limited (RLL) coding is a constrained line coding method applied to binary data sequences, where the lengths of runs of consecutive identical bits—typically zeros—are restricted to improve signal reliability in communications and storage systems. In an RLL code, denoted as (d, k), the parameter d specifies the minimum number of zeros that must separate any two consecutive ones, while k defines the maximum allowable run length of zeros between ones, ensuring that no run exceeds this limit. This constraint prevents the formation of excessively long or short sequences of identical bits in the encoded output, which could otherwise degrade performance in physical channels.[6] The primary purpose of RLL coding is to facilitate robust clock recovery at the receiver or reader, as the bounded run lengths provide regular transitions in the signal waveform that enable synchronization without relying on external clocks. By limiting maximum run lengths (via k), RLL avoids prolonged absences of transitions that could cause timing jitter or loss of phase lock; conversely, the minimum run length (d) ensures sufficient separation between ones to mitigate intersymbol interference (ISI), where adjacent symbols overlap and distort detection in bandwidth-constrained environments such as magnetic or optical recording channels. These mechanisms collectively enhance error resilience and overall system capacity by optimizing the spectral properties of the encoded signal for the channel's characteristics.[7][8] The efficiency of an RLL code is quantified by its rate, expressed as m/n, where m represents the number of input data bits encoded into n output channel bits; higher rates approach the channel's theoretical capacity while adhering to the (d, k) constraints, allowing more user data to be stored or transmitted per unit of channel resource. RLL techniques originated in the context of magnetic recording to address density limitations, but their principles extend to various digital storage and transmission applications requiring precise timing and minimal interference.[9]Need for RLL Coding
In magnetic and optical storage systems, run-length limited (RLL) coding addresses fundamental physical limitations by constraining the lengths of consecutive zeros in the encoded bitstream, thereby preventing extended periods without signal transitions. In non-return-to-zero inverted (NRZI) encoding, commonly used in these media, a logical '1' is represented by a flux reversal (in magnetic recording) or a pit/land edge transition (in optical recording), while a '0' produces no change; unconstrained binary data can thus generate arbitrarily long runs of zeros, resulting in the absence of flux transitions or edges that are essential for detecting bit boundaries.[10] This absence disrupts clock synchronization, as readback circuits rely on periodic transitions to maintain timing accuracy, leading to potential data corruption if the maximum run length exceeds the channel's timing recovery window.[11] RLL parameters (d, k), where d is the minimum and k the maximum number of consecutive zeros, ensure transitions occur frequently enough to support reliable synchronization while avoiding excessive ones that could overload the channel.[10] Early floppy disks and hard drives faced significant challenges from unconstrained data patterns, including bit shift—where nonlinear magnetization effects cause readback pulses from adjacent bits to overlap and displace in time—and equalization errors, as analog filters struggle to sharpen signals distorted by intersymbol interference (ISI) from sparse transitions.[12] In these systems, long zero runs exacerbate ISI, shifting peak positions outward and complicating the equalization process needed to isolate individual bits, which increases error rates at higher linear densities.[11] Without constraints, variable run lengths from random data lead to unpredictable transition intervals, making it difficult for inductive heads in magnetic media or laser detectors in optical media to accurately sample the signal, often resulting in timing jitter or lost synchronization during read operations.[10] By imposing RLL constraints, storage systems can maximize areal density—the bits stored per unit area—through closer bit packing, as controlled run lengths minimize ISI and timing errors without requiring wider guard bands or slower data rates.[12] This allows engineers to reduce the physical spacing between bits while keeping error rates low, enabling higher capacities in both magnetic and optical formats by optimizing the trade-off between transition frequency and channel bandwidth limitations.[10]Historical Development
Early Codes and Origins
The origins of run-length limited (RLL) codes trace back to the 1950s in magnetic tape recording systems, where early encoding schemes imposed constraints on the lengths of consecutive zeros (or "runs") between magnetic flux transitions to facilitate reliable clock synchronization and mitigate intersymbol interference.[10] These constraints were essential for maintaining timing accuracy in data retrieval, as long runs of identical bits could cause clock drift in the absence of periodic transitions.[10] IBM's pioneering work in this era laid the groundwork for such run-length controls in commercial computing applications.[13] In the 1960s, frequency modulation (FM) encoding, a (0,1) RLL code with a rate of 1/2, emerged as a foundational method for magnetic disk and early floppy disk systems, providing self-clocking through guaranteed transitions at every bit interval.[10] IBM introduced FM with the 1311 Disk Storage Drive in 1962, using double-frequency encoding to encode data bits as either a clock pulse alone (for 0) or a clock pulse plus an additional transition (for 1), thereby ensuring periodic flux changes for timing recovery without external clocks.[14] This approach doubled the effective density compared to earlier non-clocking schemes while limiting maximum run lengths to one zero, addressing synchronization challenges in removable disk packs.[14] Practical RLL block codes for noiseless channels with run-length constraints were first developed by D. T. Tang and L. R. Bahl in 1970.[1] By the late 1970s, group code recording (GCR), a (0,2) RLL code with a rate of 4/5, advanced these principles for higher-density floppy disk applications, selecting from constrained 5-bit codewords to avoid runs longer than two zeros.[10] Developed by Steve Wozniak for the Apple II's Disk II drive in 1977, GCR enabled efficient encoding on 5.25-inch media by mapping 4 data bits to 5-bit groups that inherently supported clock recovery through frequent transitions.[15] This milestone built on IBM's foundational run-length research, including early contributions from inventors like F.K. Bowers and E.R. Kretzmer, who explored constrained binary sequences for interference-limited channels in the 1950s and 1960s.[10]Evolution in Storage Technologies
In the early 1980s, Modified Frequency Modulation (MFM), a (1,3) run-length limited code, became widely adopted in personal computer hard disk drives and floppy disks, enabling approximately double the storage density compared to the earlier Frequency Modulation (FM) scheme by optimizing magnetic transitions on the recording medium.[16][17] This advancement supported the growing demand for affordable mass storage in PCs, such as those from IBM and Compaq, where MFM controllers like the Western Digital WD1002 facilitated capacities up to tens of megabytes in standard 5.25-inch form factors.[16] A pivotal refinement occurred in 1979 when IBM introduced the 3370 Direct Access Storage Device, the first commercial disk drive to incorporate (2,7) RLL coding alongside thin-film inductive heads, which allowed for higher linear recording densities in Winchester-style sealed drives by enforcing a minimum of two zero bits between ones (d=2 constraint).[18] This innovation significantly boosted areal densities in enterprise storage systems, paving the way for subsequent RLL implementations in the industry and enabling capacities exceeding 500 MB per unit.[18] As magnetic storage matured, RLL techniques extended to optical media in the 1980s and 1990s, with the Compact Disc (CD) format adopting an Eight-to-Fourteen Modulation (EFM) variant of (2,10) RLL coding starting in 1982 to balance data rate, error resilience, and low-frequency suppression on laser-read discs.[19][20] Developed jointly by Philips and Sony, EFM converted 8 data bits to 14 channel bits with additional merging bits, supporting reliable playback of 74 minutes of digital audio on 12 cm discs and influencing later optical standards like DVD precursors.[19] Standardization efforts in the late 1980s further entrenched RLL in storage interfaces, exemplified by the ANSI X3T9 committee's work on SCSI protocols. The (1,7) RLL encoding became widely used in drives compatible with SCSI-1 systems around 1987, enhancing data throughput and density in peripheral connections for hard drives. This adoption supported efficient 1.6 GB drives by the early 1990s while maintaining compatibility with existing magnetic hardware.[21]Technical Overview
Parameters and Constraints
Run-length limited (RLL) codes are defined by parameters (d, k), where a (d, k)-RLL sequence is a binary string over {0,1} in which every pair of consecutive 1s is separated by at least d and at most k consecutive 0s.[22] This constraint ensures that runs of 0s between 1s satisfy d ≤ run length ≤ k, while the leading and trailing runs of 0s may be shorter than d.[22] The validity of (d, k)-RLL sequences is determined by avoiding forbidden patterns, such as runs of fewer than d 0s or more than k 0s between consecutive 1s; for example, in a (1,3)-RLL code, patterns like "11" (fewer than 1 zero) and "0000" (more than 3 zeros) are prohibited.[22] These forbidden sub-sequences define a finite-type constraint, where a sequence belongs to the constrained set if it contains none of the prohibited words from a finite list.[22] State diagrams provide a graphical representation of (d, k)-RLL constraints using a directed graph with states typically labeled from 0 to k, where transitions labeled 0 increment the state (from i to i+1 for i = 0 to k-1) to track consecutive 0s, and transitions labeled 1 reset to state 0 only from states j ≥ d (to enforce the minimum run length).[23] This irreducible, deterministic graph, known as the Shannon cover, models all valid sequences as paths through the states, ensuring no forbidden patterns occur.[22][23] The Shannon capacity of a (d, k)-RLL channel, which quantifies the maximum achievable encoding rate in bits per symbol, is given by where denotes the number of valid (d, k)-RLL sequences of length L.[22] For irreducible constraints like (d, k)-RLL, this limit equals , where is the largest (Perron-Frobenius) eigenvalue of the graph's adjacency matrix.[22][23] Standard (d, k)-RLL imposes hard constraints, strictly prohibiting forbidden patterns to guarantee sequence validity without errors.[22] In contrast, variants with soft constraints allow occasional violations of the d or k limits, typically compensated by error-correcting codes to tolerate noise while approximating the run-length restrictions.[22]Encoding and Decoding Principles
Run-length limited (RLL) codes employ a block coding approach to map unconstrained input data into constrained output sequences that adhere to specified (d,k) run-length constraints, where d represents the minimum number of consecutive zeros between ones, and k the maximum. In this method, the input stream is partitioned into fixed-length blocks of m information bits, each of which is encoded into a longer block of n binary symbols forming a codeword from a predefined codebook C of valid sequences satisfying the constraints. This ensures that the encoded sequence avoids prohibited patterns, such as runs of zeros shorter than d or exceeding k, thereby mitigating issues like clock drift or intersymbol interference in storage channels. The construction of such block codes involves enumerating all permissible n-bit sequences under the constraints and selecting a subset of size 2^m to form an injective mapping, often optimized for maximum rate while maintaining decodability.90440-5) The rate R of an RLL block code quantifies its efficiency and is given by where |C| denotes the cardinality of the codebook, representing the number of distinct m-bit inputs that can be uniquely encoded into n-bit codewords. This formula derives from information theory principles, balancing the constraint-imposed reduction in sequence space against the need to transmit m bits reliably per n symbols. Optimal block codes approach the channel capacity, which is the logarithm of the largest eigenvalue of the constraint graph's adjacency matrix, but practical designs prioritize simple encoding tables over asymptotic optimality.[24] For real-time applications, state-machine encoding via finite-state transducers (FSTs) provides an alternative to block coding, enabling continuous processing of input bits without large buffering. An FST for RLL constraints is modeled as a directed graph with states tracking the current run length of zeros (up to k), where each input bit triggers a transition that appends output symbols while enforcing the (d,k) limits—e.g., prohibiting transitions that would create fewer than d zeros after a one. These encoders operate at a fixed rate p/q, producing q output bits per p input bits, and support sliding-block decodability, limiting error propagation to a finite window. This approach is particularly suited for hardware implementations in storage devices, as it allows deterministic or nondeterministic transitions with finite anticipation for decoding.[25] Decoding RLL-encoded sequences typically relies on maximum-likelihood sequence detection to recover the original data from noisy channel outputs, accounting for the constraints to improve reliability. The Viterbi algorithm is commonly applied, treating the RLL constraint as a finite-state Markov model where states correspond to possible run lengths, and it computes the most probable input sequence by dynamically minimizing the cumulative metric (e.g., Euclidean distance) over a trellis of constrained paths. This method inherently corrects errors by selecting the globally optimal path, outperforming threshold detection in partial-response channels, and integrates well with error-correcting codes for enhanced performance. In practice, the algorithm's complexity scales with the number of states (proportional to k), but pruning techniques reduce computational overhead without significant loss in accuracy.[26]Specific RLL Codes
Frequency Modulation (FM) and Group Code Recording (GCR)
Frequency Modulation (FM) represents one of the earliest and simplest run-length limited (RLL) codes, characterized by parameters (d=0, k=1), which allow no minimum run of zeros between ones while limiting the maximum to one consecutive bit cell without a data transition in the encoded sequence. In FM encoding, each input data bit is mapped to a two-bit channel code: a logical 1 is encoded as 10, and a logical 0 is encoded as 00. This scheme doubles the effective bit rate compared to non-clocked methods by embedding clock information directly into the data stream, ensuring a transition (clock pulse) occurs at the start of every bit cell and an optional mid-cell transition for data ones. The result is self-clocking operation, where the receiver can synchronize without a separate clock track, as a clock transition appears every two channel bits.[27][28] The truth table for FM encoding is straightforward, reflecting its binary mapping:| Input Data Bit | Encoded Channel Bits |
|---|---|
| 0 | 00 |
| 1 | 10 |
| 5-Bit Input (Decimal) | 8-Bit Output (Binary) |
|---|---|
| 0 (00000) | [selected to enforce (0,2)] |
| 1 (00001) | [selected to enforce (0,2)] |
| ... (up to 31) | ... (selected to enforce (0,2)) |
Modified Frequency Modulation (MFM) and (1,7) RLL
Modified Frequency Modulation (MFM) is a rate-1/2 (1,3) run-length limited code widely adopted for data encoding on magnetic storage media, enforcing a minimum of one zero between consecutive ones (d=1) and a maximum run of three zeros (k=3) to balance spectral properties and timing reliability. The encoding process operates via a state-dependent mechanism, where the output depends on the previous encoded bit to suppress unnecessary transitions and avoid invalid run lengths. Specifically, a data bit of 1 is encoded as 10 if the preceding encoded bit was 0 (inserting a clock transition), or as 11 if the preceding encoded bit was 1 (merging with the prior clock). A data bit of 0 is encoded as 00 only if the preceding encoded bit was 1; otherwise, the clock bit is omitted, resulting in no additional output for that cell to prevent excessive zeros.[30] This approach achieves approximately 91% efficiency relative to the channel capacity, improving density over frequency modulation schemes by eliminating redundant clock bits in certain positions. The following table illustrates basic MFM encoding examples, showing input data sequences and their corresponding codewords, assuming an initial state with a preceding 0:| Data Bits | Encoded Bits | Explanation |
|---|---|---|
| 0 | (no output, suppress clock) | Omitted if prior was 0; assumes merge with previous. |
| 1 | 10 | 1 with inserted clock 0. |
| 00 | 1000 | First 0 suppressed, second 0 as 00 after 1's 10. |
| 01 | 100 | 0 suppressed after 10, 1 as 10 but merged. |
| 10 | 110 | 1 as 11 (after prior 1 implied), 0 as 00 merged to 0. |
| 11 | 1011 | First 1 as 10, second 1 as 11 after 0. |
(2,7) RLL and Higher-Order Codes
The (2,7) RLL code is a rate-1/2 partial-response code designed by IBM engineers and first implemented commercially in 1979 on the IBM 3370 direct access storage device. This code enforces a minimum of two zeros and a maximum of seven zeros between consecutive ones in the channel bits to mitigate intersymbol interference and clock recovery issues in high-density magnetic recording, achieving approximately 50% higher linear density compared to earlier schemes like MFM. It employs a fixed-length block encoding approach, where pairs of data bits are mapped to four-bit codewords (e.g., "00" to "1001", and other pairs to similar 4-bit patterns ensuring the constraints), combined with state merging in a finite-state machine to satisfy the (d,k) constraints while maintaining time-invariance. The partial-response aspect shapes the spectrum to match the channel's response, reducing low-frequency content and enabling better signal detection. Encoding algorithms typically use a sliding-block decoder with look-up tables for constraint satisfaction, ensuring non-catastrophic sequences and efficient merging of code blocks to avoid violations at boundaries.[6] Building on such designs, the Eight-to-Fourteen Modulation (EFM) code, adopted for compact disc (CD) audio storage, operates as a (2,10) RLL code with an effective rate of 8/17 through an 8/14 block mapping augmented by merging bits. Developed by Philips researchers, EFM translates each 8-bit data word into one of 256 possible 14-bit codewords selected from a lookup table to ensure at least two and at most ten consecutive zeros between ones, thereby limiting DC content and facilitating pit/land transitions in optical recording. Three additional merging bits (chosen from {001, 000, 010, 011, 100, 101} to avoid long runs) are inserted between adjacent 14-bit blocks, resulting in 17 channel bits per 8 data bits and maintaining the overall (2,10) constraint across frame boundaries. The encoding algorithm involves a nonlinear optimization to select codewords and merging bits that minimize run lengths while preserving the lookup table's fixed mapping, with decoding achieved via a state machine that verifies run-length compliance and extracts data. This structure supports reliable clock extraction and error detection in the CD's constant linear velocity format.[37] For advanced partial-response maximum-likelihood (PRML) channels, the HHH(1,13) code represents a rate-2/3 matched spectral-line RLL variant developed by IBM researchers Hirt, Hassner, and Heise for high-speed infrared communications, such as the 16 Mbit/s IrDA Very Fast Infrared (VFIR) physical layer. This code constrains runs of zeros to between one and thirteen, with a spectral shape engineered to nullify low-frequency components (around DC) while boosting mid-band energy to align with the PRML channel's frequency response, typically exhibiting a null at f=0 and enhanced gain at higher frequencies for improved signal-to-noise ratio in partial-response signaling like duobinary or PR4. The matched spectral design minimizes baseline wander and optimizes power spectral density for channels with transfer functions H(f) ≈ sin(πfT) or cos(πfT), yielding up to 1-2 dB coding gain over uncoded systems. Encoding relies on a time-varying finite-state encoder that maps 2-bit data symbols to 3-bit codewords using a composite structure of fixed and variable blocks, ensuring constraint satisfaction through careful state transitions and avoidance of catastrophic error propagation; decoding integrates Viterbi algorithms tailored to the PRML detector for joint constraint and channel decoding.Applications and Examples
Encoding Illustrations
To illustrate the encoding process in run-length limited (RLL) codes, consider the binary data string 10001111, which will be transformed using frequency modulation (FM), modified frequency modulation (MFM), and (2,7) RLL schemes. These examples demonstrate how each code maps the input to a channel bit stream, where 1s represent flux reversals (transitions) and 0s represent no transition, with run lengths referring to the number of consecutive 0s between 1s. The goal is to maintain the specified minimum (d) and maximum (k) run lengths while maximizing density by minimizing unnecessary transitions.[38] For FM encoding, a (0,1) RLL code, each data bit is paired with a leading clock bit of 1. A data 0 becomes 10 (one transition followed by no transition), and a data 1 becomes 11 (two transitions). Starting with the input 1 0 0 0 1 1 1 1, the pairs are: 11 (for first 1), 10 (second 0), 10 (third 0), 10 (fourth 0), 11 (fifth 1), 11 (sixth 1), 11 (seventh 1), 11 (eighth 1). The concatenated channel bit stream is 1110101010111111, with 13 transitions (1s). The run lengths are multiple instances of 0 zeros (adjacent 1s) and 1 zero, satisfying d=0 (allowing adjacent transitions for 1s) and k=1 (no more than one 0 between 1s, aided by clock bits). This results in an average of 1.625 transitions per data bit, limiting density.[39][40] For MFM encoding, a (1,3) RLL code, the process inserts variable clock bits to self-clock without fixed 1s, encoding each data bit into two channel bits while ensuring at least one 0 between 1s (d=1) and no more than three 0s (k=3). A data 1 is always encoded as 01 (no transition followed by transition). A data 0 is encoded as 00 (two no transitions) if the previous channel bit ended with a transition (i.e., after a 1, no clock), or 10 (transition followed by no) if after no transition (i.e., after a 0, with clock). For the input 10001111, the pairs are: 01 (first 1), 00 (second 0 after 1), 10 (third 0 after 0), 10 (fourth 0 after 0), 01 (fifth 1), 01 (sixth 1), 01 (seventh 1), 01 (eighth 1), yielding the channel bit stream 0100101001010101 with 7 transitions. The run lengths are 2,1,2,1,1,1,1 zeros, avoiding violations and achieving an average of 0.875 transitions per data bit—roughly half that of FM for doubled density.[41][38] For (2,7) RLL encoding, a rate-1/2 constrained code (d=2, k=7), groups of 2–4 data bits are mapped to 4–8 channel bits using a lookup table to ensure 2–7 zeros between 1s. The table includes mappings like 10 → 1000 (run of 3 0s), 00 → 000100 (runs of 3 and 2 0s after previous 1), with state-based merging to prevent violations at boundaries. For 10001111, the encoder groups as (10)(00)(1111), mapping to codewords that concatenate without violation: e.g., 1000 (for 10), 000100 (for 00, extending run safely), then a longer group for 1111 as 0001000 (adjusted for merging). One efficient resulting stream is 100000100010000 (16 bits, 3 1s), with run lengths of 5,3 zeros (within 2–7), averaging ~0.375 transitions per data bit for higher density than MFM.[40][38]| Input Data | FM Channel Bits (16 bits, 13 1s) | FM Run Lengths | MFM Channel Bits (16 bits, 7 1s) | MFM Run Lengths | (2,7) RLL Channel Bits (16 bits, 3 1s) | (2,7) RLL Run Lengths |
|---|---|---|---|---|---|---|
| 10001111 | 1110101010111111 | 0,1,0,1,0,1,0 | 0100101001010101 | 2,1,2,1,1,1,1 | 100000100010000 | 5,3 |
Use in Storage Media
Run-length limited (RLL) codes have been integral to magnetic and optical storage media, enabling efficient data encoding while adhering to physical constraints of recording technologies. In early floppy disk systems, frequency modulation (FM), equivalent to (0,1) RLL encoding, was employed in 8-inch disks to ensure reliable clock synchronization and data recovery on single-density media.[40] This approach placed a clock pulse after every data bit, limiting consecutive zeros to a maximum of one, which suited the lower densities of the era but limited capacity.[40] Subsequent advancements saw modified frequency modulation (MFM), a (1,3) RLL code, adopted for higher-density floppy disks, including the standard 3.5-inch formats. MFM improved efficiency by allowing up to three consecutive zeros between ones, roughly doubling the storage capacity over FM without requiring hardware changes, and became ubiquitous in personal computing floppy drives.[42] In hard disk drives, (2,7) RLL encoding emerged in the 1980s as a standard for IDE/ATA interfaces, permitting up to seven zeros between ones to achieve approximately 50% greater density than MFM while maintaining signal integrity. For enterprise storage, (1,7) RLL variants were utilized to support higher linear densities in server-oriented drives, balancing run-length constraints with error rates in multi-user environments. Optical media leveraged RLL principles through eight-to-fourteen modulation (EFM), a (2,10) code implemented in compact discs (CDs) to merge bits and control DC content, ensuring pit lengths between 3T and 11T for laser readability.[43] DVDs extended this with eight-to-sixteen (EFM+) modulation, a (2,10) RLL code with rate 8/16, reducing the need for merging bits compared to EFM for higher effective density.[44] By the 1990s, hard disk drives transitioned from peak-detection RLL methods to partial-response maximum-likelihood (PRML) detection, which integrated RLL precoding to model intersymbol interference and achieve bit error rates below 10^{-15} when combined with error-correcting codes.[45] IBM pioneered PRML in production HDDs in 1990, marking a shift that boosted areal densities and paved the way for modern recording channels.[45]Performance Metrics
Recording Densities
Run-length limited (RLL) codes enhance recording densities in magnetic storage by imposing constraints on the minimum (d) and maximum (k) run lengths of consecutive zeros in the encoded channel bits, which minimizes intersymbol interference (ISI) while maximizing the usable frequency spectrum of the recording channel. This allows for denser packing of flux transitions on the medium without requiring hardware changes. Linear recording density, typically measured in bits per inch (bpi), is fundamentally tied to the rate of flux changes per inch (fcpi); higher densities are achieved by encoding more data bits into fewer channel bits or by supporting shorter channel bit periods.[12] For example, frequency modulation (FM) encoding, an early RLL variant with a (0,1) constraint, supports linear densities around 2000 bpi on floppy disks and early hard drives, as it requires a flux change for every clock bit and data bit, resulting in frequent transitions that limit fcpi. Modified frequency modulation (MFM), a (1,3) RLL code with a more efficient mapping, doubles this to approximately 4000 bpi by eliminating unnecessary clock transitions, effectively halving the average fcpi while maintaining self-clocking properties.[12] The (2,7) RLL code further improves upon MFM by enforcing a minimum run of two zeros (d=2), which spaces transitions farther apart to combat ISI at higher densities, yielding a 1.5× areal density gain—combining linear and track densities—on the same medium, often reaching 6000 bpi or more in practical implementations.[12][10] The effective linear density achievable with an RLL code is given by , where is the code rate (data bits per channel bit) and is the maximum transition frequency supported by the recording channel, determined by medium and head characteristics.[10] For instance, while MFM and FM both have , MFM's design permits a higher due to better ISI management, explaining its density advantage. A key trade-off arises with the k parameter: larger k values enable higher code rates (approaching the channel capacity for fixed d), supporting greater densities, but they permit longer maximum runs of zeros, increasing the risk of clock drift and timing errors during readback.[10] This balance is critical in optimizing areal density, defined as the product of linear bpi and tracks per inch (tpi), where RLL constraints directly influence both components without altering the physical medium.[12]Capacity and Efficiency
The capacity of a run-length limited (RLL) channel with parameters (d, k) is defined as the maximum achievable rate for encoding binary sequences subject to the constraint that no fewer than d and no more than k consecutive zeros separate any two ones, measured in bits per symbol. This capacity, denoted C(d, k), is given by the base-2 logarithm of the largest eigenvalue of the transition matrix associated with the finite-state shift register model of the constraint. For binary alphabets, the unconstrained channel capacity is 1 bit/symbol, so the efficiency η of an (d, k)-RLL code is simply η = C(d, k), representing the fraction of the ideal rate that can be achieved while satisfying the run-length constraints.[46] Approximate values of C(d, k) for common (d, k) pairs used in storage applications are summarized in the following table, highlighting how increasing d reduces capacity by enforcing sparser transitions, while k bounds the maximum run length to prevent timing issues:| (d, k) | C(d, k) (bits/symbol) |
|---|---|
| (0, ∞) | 1.0000 |
| (1, ∞) | 0.6942 |
| (1, 7) | 0.6793 |
| (2, ∞) | 0.5515 |
| (2, 7) | 0.5174 |
| (3, ∞) | 0.4650 |
| (4, ∞) | 0.4057 |
