Move-to-front transform
View on WikipediaThis article needs additional citations for verification. (May 2011) |
The move-to-front (MTF) transform is an encoding of data (typically a stream of bytes) designed to improve the performance of entropy encoding techniques of compression. When efficiently implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression algorithm.
This algorithm was first published by Boris Ryabko under the name of "book stack" in 1980.[1] Subsequently, it was rediscovered by J.K. Bentley et al. in 1986,[2] as attested in the explanatory note.[3]
The transform
[edit]The main idea is that each symbol in the data is replaced by its index in the stack of “recently used symbols”. For example, long sequences of identical symbols are replaced by as many zeroes, whereas when a symbol that has not been used in a long time appears, it is replaced with a large number. Thus at the end the data is transformed into a sequence of integers; if the data exhibits a lot of local correlations, then these integers tend to be small.
Let us give a precise description. Assume for simplicity that the symbols in the data are bytes. Each byte value is encoded by its index in a list of bytes, which changes over the course of the algorithm. The list is initially in order by byte value (0, 1, 2, 3, ..., 255). Therefore, the first byte is always encoded by its own value. However, after encoding a byte, that value is moved to the front of the list before continuing to the next byte.
An example will shed some light on how the transform works. Imagine instead of bytes, we are encoding values in a–z. We wish to transform the following sequence:
bananaaa
By convention, the list is initially (abcdefghijklmnopqrstuvwxyz). The first letter in the sequence is b, which appears at index 1 (the list is indexed from 0 to 25). We put a 1 to the output stream:
1
The b moves to the front of the list, producing (bacdefghijklmnopqrstuvwxyz). The next letter is a, which now appears at index 1. So we add a 1 to the output stream. We have:
1,1
and we move the letter a back to the top of the list. Continuing this way, we find that the sequence is encoded by:
1,1,13,1,1,1,0,0
| Iteration | Sequence | List |
|---|---|---|
| bananaaa | 1 | (abcdefghijklmnopqrstuvwxyz) |
| bananaaa | 1,1 | (bacdefghijklmnopqrstuvwxyz) |
| bananaaa | 1,1,13 | (abcdefghijklmnopqrstuvwxyz) |
| bananaaa | 1,1,13,1 | (nabcdefghijklmopqrstuvwxyz) |
| bananaaa | 1,1,13,1,1 | (anbcdefghijklmopqrstuvwxyz) |
| bananaaa | 1,1,13,1,1,1 | (nabcdefghijklmopqrstuvwxyz) |
| bananaaa | 1,1,13,1,1,1,0 | (anbcdefghijklmopqrstuvwxyz) |
| bananaaa | 1,1,13,1,1,1,0,0 | (anbcdefghijklmopqrstuvwxyz) |
| Final | 1,1,13,1,1,1,0,0 | (anbcdefghijklmopqrstuvwxyz) |
It is easy to see that the transform is reversible. Simply maintain the same list and decode by replacing each index in the encoded stream with the letter at that index in the list. Note the difference between this and the encoding method: The index in the list is used directly instead of looking up each value for its index.
i.e. you start again with (abcdefghijklmnopqrstuvwxyz). You take the "1" of the encoded block and look it up in the list, which results in "b". Then move the "b" to front which results in (bacdef...). Then take the next "1", look it up in the list, this results in "a", move the "a" to front ... etc.
Implementation
[edit]Details of implementation are important for performance, particularly for decoding. For encoding, no clear advantage is gained by using a linked list, so using an array to store the list is acceptable, with worst-case performance O(nk), where n is the length of the data to be encoded and k is the number of values (generally a constant for a given implementation).
The typical performance is better because frequently used symbols are more likely to be at the front and will produce earlier hits. This is also the idea behind a Move-to-front self-organizing list.
However, for decoding, we can use specialized data structures to greatly improve performance.[example needed]
Python
[edit]This is a possible implementation of the move-to-front algorithm in Python.
from collections.abc import Generator, Iterable
class MoveToFront:
"""
>>> mtf = MoveToFront()
>>> list(mtf.encode("Wikipedia"))
[87, 105, 107, 1, 112, 104, 104, 3, 102]
>>> mtf.decode([87, 105, 107, 1, 112, 104, 104, 3, 102])
'Wikipedia'
>>> list(mtf.encode("wikipedia"))
[119, 106, 108, 1, 113, 105, 105, 3, 103]
>>> mtf.decode([119, 106, 108, 1, 113, 105, 105, 3, 103])
'wikipedia'
"""
def __init__(self, common_dictionary: Iterable[int] = range(256)):
"""
Instead of always transmitting an "original" dictionary,
it is simpler to just agree on an initial set.
Here we use the 256 possible values of a byte.
"""
# consume the iterable so it can be used multiple times
self.common_dictionary = list(common_dictionary)
def encode(self, plain_text: str) -> Generator[int]:
# Changing the common dictionary is a bad idea. Make a copy.
dictionary = list(self.common_dictionary)
# Read in each character
for c in plain_text.encode("latin-1"): # Change to bytes for 256.
# Find the rank of the character in the dictionary [O(k)]
rank = dictionary.index(c) # the encoded character
yield rank
# Update the dictionary [Θ(k) for insert]
dictionary.pop(rank)
dictionary.insert(0, c)
def decode(self, compressed_data: Iterable[int]) -> str:
"""
Inverse function that recover the original text
"""
dictionary = list(self.common_dictionary)
plain_text = []
# Read in each rank in the encoded text
for rank in compressed_data:
# Remove the character of that rank from the dictionary
e = dictionary.pop(rank)
plain_text.append(e)
# Insert the character at the beginning of dictionary
dictionary.insert(0, e)
return bytes(plain_text).decode("latin-1") # Return original string
In this example we can see the MTF code taking advantage of the three repetitive i's in the input word. The common dictionary here, however, is less than ideal since it is initialized with more commonly used ASCII printable characters put after little-used control codes, against the MTF code's design intent of keeping what's commonly used in the front. If one rotates the dictionary to put the more-used characters in earlier places, a better encoding can be obtained:
from itertools import chain
def block32(x):
return range(x, x+32)
class MoveToFrontMoreCommon(MoveToFront):
"""
>>> mtf = MoveToFrontMoreCommon()
>>> list(mtf.encode("Wikipedia"))
[55, 10, 12, 1, 17, 9, 9, 3, 7]
"""
def __init__(self):
super().__init__(
chain( # Sort the ASCII blocks:
block32(ord("a") - 1), # first lowercase,
block32(ord("A") - 1), # then uppercase,
block32(ord("!") - 1), # punctuation/number,
block32(0), # control codes,
range(128, 256), # and finally the non-ASCII stuff
)
)
if __name__ == "__main__":
import doctest
doctest.testmod()
Use in practical data compression algorithms
[edit]The MTF transform takes advantage of local correlation of frequencies to reduce the entropy of a message.[clarification needed] Indeed, recently used letters stay towards the front of the list; if use of letters exhibits local correlations, this will result in a large number of small numbers such as "0"'s and "1"'s in the output.
However, not all data exhibits this type of local correlation, and for some messages, the MTF transform may actually increase the entropy.
An important use of the MTF transform is in Burrows–Wheeler transform based compression. The Burrows–Wheeler transform is very good at producing a sequence that exhibits local frequency correlation from text and certain other special classes of data. Compression benefits greatly from following up the Burrows–Wheeler transform with an MTF transform before the final entropy-encoding step.
Example
[edit]This following section may be confusing or unclear to readers. In particular, What is the method used to arrive at these numbers?. (February 2011) |
As an example, imagine we wish to compress Hamlet's soliloquy (To be, or not to be...). We can calculate the size of this message to be 7033 bits. Naively, we might try to apply the MTF transform directly. The result is a message with 7807 bits (higher than the original). The reason is that English text does not in general exhibit a high level of local frequency correlation. However, if we first apply the Burrows–Wheeler transform, and then the MTF transform, we get a message with 6187 bits. Note that the Burrows–Wheeler transform does not decrease the entropy of the message; it only reorders the bytes in a way that makes the MTF transform more effective.
One problem with the basic MTF transform is that it makes the same changes for any character, regardless of frequency, which can result in diminished compression as characters that occur rarely may push frequent characters to higher values. Various alterations and alternatives have been developed for this reason. One common change is to make it so that characters above a certain point can only be moved to a certain threshold. Another is to make some algorithm that runs a count of each character's local frequency and uses these values to choose the characters' order at any point. Many of these transforms still reserve zero for repeat characters, since these are often the most common in data after the Burrows–Wheeler Transform.
Move-to-front linked-list
[edit]- The term Move To Front (MTF) is also used in a slightly different context, as a type of a dynamic linked list. In an MTF list, each element is moved to the front when it is accessed.[4] This ensures that, over time, the more frequently accessed elements are easier to access.
References
[edit]- ^ Ryabko, Boris Yakovlevich [in Russian] (1980). "Data compression by means of a "book stack"" (PDF). Problems of Information Transmission. 16 (4): 265–269. Zbl 0466.94007.
- ^ Bentley, Jon Louis; Sleator, Daniel Dominic Kaplan; Tarjan, Robert Endre; Wei, V. K. (1986). "A Locally Adaptive Data Compression Scheme". Communications of the ACM. 29 (4): 320–330. CiteSeerX 10.1.1.69.807. doi:10.1145/5684.5688. S2CID 5854590.
- ^ Ryabko, Boris Yakovlevich [in Russian]; Horspool, R. Nigel; Cormack, Gordon Villy (1987). "Comments to: "A locally adaptive data compression scheme" by J. L. Bentley, D. D. Sleator, R. E. Tarjan and V. K. Wei". Comm. ACM. 30 (9): 792–794. doi:10.1145/30401.315747. S2CID 16138142.
- ^ Rivest, Ronald Linn (1976). "On self-organizing sequential search heuristics". Communications of the ACM. 19 (2): 63–67. doi:10.1145/359997.360000. S2CID 498886.
External links
[edit]Move-to-front transform
View on GrokipediaHistory and Overview
Origins and Development
The move-to-front (MTF) transform, initially introduced as the "book stack" method, was first published by Boris Ryabko in 1980 as a technique for data compression. In this work, Ryabko described a reversible transformation that reorganizes a sequence of symbols based on their recency of appearance, aiming to reduce the entropy of the data for more efficient encoding. The approach was motivated by the need for adaptive methods to handle non-stationary sources in information transmission, drawing from early ideas in statistical testing and compression.[4] The MTF transform was independently rediscovered and popularized six years later by Jon Louis Bentley, Daniel D. Sleator, Robert E. Tarjan, and Victor K. Wei in their 1986 paper on a locally adaptive data compression scheme. This publication framed the transform within the context of self-organizing lists, adapting heuristics originally developed for optimizing search algorithms in dynamic data structures to create a preprocessing step for entropy coding. The authors demonstrated its effectiveness in reducing redundancy for real-world data streams, establishing it as a practical tool for universal compression.[5] Following its 1986 introduction, the MTF transform saw widespread adoption in the data compression literature, particularly as a post-processing step in Burrows-Wheeler transform (BWT)-based schemes during the 1990s.[6] It became integral to tools like bzip2, released in 1996, where it enhances run-length encoding and arithmetic coding for improved lossless compression ratios.[7] Additionally, MTF appeared in studies on universal lossless source coding, contributing to asymptotically optimal schemes for arbitrary sources by facilitating better symbol probability estimation.[8]Relation to Self-Organizing Structures
Self-organizing lists are dynamic data structures that maintain a permutation of items and reorder them based on access patterns to minimize the average time for future searches.[9] These lists adapt by applying heuristics after each access, such as swapping or shifting accessed items toward the front, thereby exploiting patterns like temporal locality where recently accessed items are more likely to be requested again.[10] The move-to-front (MTF) method is a prominent heuristic within self-organizing lists, where an accessed item is immediately relocated to the head of the list, shifting all preceding items backward by one position.[10] This contrasts with other heuristics like the transpose method, which swaps the accessed item with the one immediately before it, or the move-to-middle method, which repositions it at the list's center; MTF tends to perform better in scenarios with strong locality as it more aggressively promotes recent items.[10] Under the independent reference model (IRM), where each access is probabilistically independent and follows a fixed stationary distribution, MTF achieves optimality in expected search cost among the move-ahead-k family of heuristics.[9] In the context of data compression, the MTF transform adapts these self-organizing principles by treating the symbol sequence as an access pattern, reordering a prediction list after each symbol to encode its position rather than the symbol itself. This reordering leverages temporal locality to make frequently recurring symbols likely to appear early in the list, thereby producing a transformed sequence with clustered low values that enhances the effectiveness of subsequent entropy coding.Algorithm Description
Encoding Process
The move-to-front (MTF) encoding process transforms an input sequence of symbols into a sequence of indices by maintaining an ordered list of symbols and dynamically reordering it based on symbol occurrences. The list is typically initialized with all possible symbols from a fixed alphabet in a predefined order, such as the sorted order of extended ASCII characters (0 to 255) for byte-oriented data or the alphabetical order [a, b, c, ..., z] for text.[11][12] For each input symbol $ s $ in the sequence, the encoder locates the current position $ i $ of $ s $ in the list (using 0-based indexing, where the front is index 0), outputs $ i $ (often encoded with a variable-length code to exploit the bias toward low indices), and then moves $ s $ to the front of the list. This movement is commonly implemented by swapping $ s $ with each preceding symbol until it reaches the front, ensuring that recently encountered symbols have lower indices in subsequent steps. In standard MTF with a fixed alphabet, all input symbols are assumed to be present in the initial list; if an unknown symbol is encountered, it may cause an error or require preprocessing to map it to the alphabet. In adaptive variants starting with an empty list, upon encountering a new symbol not in the current list of N symbols, output N+1 (using 1-based indexing) followed by the raw symbol, then insert it at the front of the list. This ensures decoder synchronization without transmitting the final list state.[5] To illustrate, consider encoding the string "bananaaa" with an initial list [a, b, c, ..., z] (positions 0 for a, 1 for b, ..., 25 for z):- First symbol 'b' (position 1): output 1; move 'b' to front → list = [b, a, c, ..., z].
- Second symbol 'a' (now position 1): output 1; move 'a' to front → list = [a, b, c, ..., z].
- Third symbol 'n' (position 13): output 13; move 'n' to front → list = [n, a, b, c, ..., m, o, ..., z].
- Fourth symbol 'a' (now position 1): output 1; move 'a' to front → list = [a, n, b, c, ..., m, o, ..., z].
- Fifth symbol 'n' (now position 1): output 1; move 'n' to front → list = [n, a, b, c, ..., m, o, ..., z].
- Sixth symbol 'a' (now position 1): output 1; move 'a' to front → list = [a, n, b, c, ..., m, o, ..., z].
- Seventh symbol 'a' (position 0): output 0; move 'a' to front → list unchanged.
- Eighth symbol 'a' (position 0): output 0; move 'a' to front → list unchanged.
Decoding Process
The decoding process of the move-to-front (MTF) transform recovers the original symbol sequence from the encoded stream of indices by performing the inverse operations, ensuring that the decoder's symbol list remains synchronized with the encoder's through identical reordering rules.[5][12] For fixed alphabets, such as in applications with the Burrows-Wheeler transform, the decoder initializes a list containing all possible symbols in a predefined order, typically sorted alphabetically (e.g., 'a' at position 0, 'b' at 1, up to 'z' at 25 for lowercase English letters).[12] For each index $ i $ in the input stream (0-based positioning), the decoder retrieves the symbol at position $ i $ in the current list, outputs it as the next symbol in the reconstructed sequence, and then moves that symbol to the front of the list (position 0), shifting the preceding symbols rightward.[5] This process repeats for all indices, yielding the original sequence since the list updates mirror those during encoding.[12] In dynamic alphabet scenarios, where the symbol set may grow during processing, the list starts empty, and the decoder reads an integer $ p $ for each step. If $ p $ equals the current list size $ N $ plus one, it reads a new raw symbol, inserts it at the front of the list, and outputs it; otherwise, if $ p \leq N $, it outputs the symbol at position $ p $ (1-based in this formulation) and moves it to the front.[5] This handles unseen symbols without prior knowledge of the full alphabet, maintaining list synchronization as long as the encoder and decoder process the same stream.[5] Consider the encoded indices [1, 1, 13, 1, 1, 1, 0, 0] derived from the sequence "bananaaa" over a fixed lowercase English alphabet, initialized as ['a', 'b', 'c', ..., 'z']:- Index 1: Retrieve 'b' (position 1), output 'b'; list becomes ['b', 'a', 'c', ..., 'z'].
- Index 1: Retrieve 'a' (position 1), output 'a'; list becomes ['a', 'b', 'c', ..., 'z'].
- Index 13: Retrieve 'n' (position 13), output 'n'; list becomes ['n', 'a', 'b', ..., 'm', 'o', ..., 'z'].
- Index 1: Retrieve 'a' (position 1), output 'a'; list becomes ['a', 'n', 'b', ..., 'm', 'o', ..., 'z'].
- Index 1: Retrieve 'n' (position 1), output 'n'; list becomes ['n', 'a', 'b', ..., 'm', 'o', ..., 'z'].
- Index 1: Retrieve 'a' (position 1), output 'a'; list becomes ['a', 'n', 'b', ..., 'm', 'o', ..., 'z'].
- Index 0: Retrieve 'a' (position 0), output 'a'; list remains ['a', 'n', 'b', ..., 'm', 'o', ..., 'z'].
- Index 0: Retrieve 'a' (position 0), output 'a'; list remains ['a', 'n', 'b', ..., 'm', 'o', ..., 'z'].
Mathematical Formulation
Position Encoding and List Reordering
The move-to-front transform processes an input sequence where each and is a finite alphabet of size . At step , a dynamic list represents a permutation of the symbols in .[5] The initial list is typically initialized as the symbols of in sorted order (e.g., ascending ASCII values for byte alphabets), though some implementations augment it with an end-of-file (EOF) symbol to handle block boundaries in compression pipelines.[12] For each , the encoding step identifies the 0-based index of in , where , and outputs this integer. The list is then reordered via the move-to-front operation: move-to-front, which places at the front while preserving the relative order of all other symbols.[5][12] This reordering can be expressed formally as:Information-Theoretic Properties
The move-to-front (MTF) transform reorders the symbol list to prioritize recently occurring symbols at positions 0, 1, and so on, thereby assigning low-index outputs to symbols with temporal locality. This shifts probability mass toward smaller integers in the output sequence, reducing the expected code length for subsequent entropy encoding, particularly for data where symbol frequencies exhibit local correlations rather than global stationarity. In sequences with runs of repeated symbols, the MTF output tends to cluster positions near 0, as each recurrence immediately yields index 0 after the initial access. This concentration enables approximate savings of bits per symbol relative to encoding over a uniform alphabet , since arithmetic or Huffman coders can exploit the skewed distribution with very short codes for low indices.[6] For non-i.i.d. sources, the joint entropy equals due to invertibility, but the empirical marginals become more compressible, yielding average code lengths approaching bits per symbol under discrete memoryless assumptions.[1] Theoretical bounds confirm that MTF provides no compression gain in the worst case, such as uniformly random i.i.d. inputs, but it achieves near-optimality for access patterns modeled by the independent reference model, where access probabilities to list positions decline as . In this setting, MTF minimizes expected access costs among permutation-based self-organizing strategies, with stationary probabilities aligning closely to the harmonic distribution.[13] When paired with entropy coders, MTF acts as an adaptive zeroth-order preprocessor, converting local dependencies into global frequency biases that higher-order models would otherwise capture explicitly; this locality-based adaptation often bounds the output size to within a small constant factor of the input's -th order empirical entropy for any .[14]Implementation
Data Structures and Efficiency
The basic implementation of the move-to-front (MTF) transform uses an array to represent the ordered list of symbols from the alphabet , where . Access to the position of a symbol is achieved in time by maintaining a parallel array mapping each symbol to its current index in the list. However, moving the accessed symbol to the front requires shifting the preceding elements, incurring time per operation. For a sequence of length , the total encoding time is thus .[5][15] To optimize position lookups in scenarios with non-integer or sparse symbols, a hash table can map symbols to their positions, achieving average-case access time, while the list itself may still use an array or linked structure for reordering. Updates remain due to the need to adjust positions for shifted elements. The original MTF scheme notes that such hashing provides average operations for word lookups in adaptive compression contexts.[5] For further efficiency in large alphabets, dynamic rank structures like Fenwick trees (binary indexed trees) can maintain the order implicitly, enabling position queries and updates in time by treating the list as a sequence of ranks and performing range updates. This approach integrates well with MTF in probability table maintenance for subsequent entropy coders, reducing average memory references per symbol compared to naive methods (e.g., approximately 18 references/symbol versus 33 for basic MTF on benchmark texts).[16] Space requirements are for both the symbol list and position mapping, which is efficient for fixed alphabets such as ASCII (), fitting comfortably in modern memory constraints without additional overhead. Decoding mirrors encoding efficiency, using the same initial list and reversible updates: each code outputs the -th list element and moves it to front, with identical time complexities. A key pitfall is potential list desynchronization if encoder and decoder initials differ or transmission errors occur, though this is mitigated in lossless settings by ensuring identical setups.[5][16] Arrays are preferred for small, dense due to constant-time access and simplicity, while linked lists suit sparse or dynamic alphabets by enabling moves with direct node pointers (though position queries degrade to without augmentation); further adaptations are explored in variants.[5]Programming Example
The move-to-front (MTF) transform can be implemented efficiently in Python using list operations to manage the symbol table. For simplicity in the following example with the string "bananaaa", we use a 26-letter alphabet of lowercase letters 'a' to 'z'. The encoder iterates over the input string, locates each character's ordinal value in the list, outputs its index, and moves it to the front. Similarly, the decoder processes the sequence of indices to reconstruct the original string by retrieving and reordering symbols.[17] Here is a Python implementation of the encoder:def mtf_encode(text):
# Initialize symbol table with lowercase letters a-z
symbols = [ord(c) for c in 'abcdefghijklmnopqrstuvwxyz']
encoded = []
for char in text:
idx = symbols.index(ord(char))
encoded.append(idx)
# Move to front: remove and insert at position 0
del symbols[idx]
symbols.insert(0, ord(char))
return encoded
This function assumes input characters are lowercase letters; it yields a list of indices representing the transformed sequence.[17]
The corresponding decoder reverses the process:
def mtf_decode(encoded):
# Initialize symbol table with lowercase letters a-z
symbols = [ord(c) for c in 'abcdefghijklmnopqrstuvwxyz']
decoded = []
for idx in encoded:
char_ord = symbols[idx]
decoded.append(chr(char_ord))
# Move to front: remove and insert at position 0
del symbols[idx]
symbols.insert(0, char_ord)
return ''.join(decoded)
This decoder outputs the original string, ensuring invertibility as long as the symbol table matches the encoder's initialization.[17]
To illustrate, consider encoding and decoding the string "bananaaa":
- Input: "bananaaa"
- Encoded indices: [1, 1, 13, 1, 1, 1, 0, 0]
- Decoded output: "bananaaa"
mtf_decode(mtf_encode("bananaaa")) yields "bananaaa", verifying the round-trip correctness. This example demonstrates how repeated symbols like 'a' produce low indices (0 or 1) after initial occurrences, aiding compression.[17]
For extensions beyond basic alphabets, the symbol table can be adjusted to a custom list of Unicode code points or a specific alphabet (e.g., symbols = list(range(256)) for full ASCII), provided both encoder and decoder use the same initialization to maintain compatibility.[15]