Recent from talks
Contribute something
Nothing was collected or created yet.
Viterbi algorithm
View on WikipediaThe Viterbi algorithm is a dynamic programming algorithm that finds the most likely sequence of hidden events that would explain a sequence of observed events. The result of the algorithm is often called the Viterbi path. It is most commonly used with hidden Markov models (HMMs). For example, if a doctor observes a patient's symptoms over several days (the observed events), the Viterbi algorithm could determine the most probable sequence of underlying health conditions (the hidden events) that caused those symptoms.
The algorithm has found universal application in decoding the convolutional codes used in both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space communications, and 802.11 wireless LANs. It is also commonly used in speech recognition, speech synthesis, diarization,[1] keyword spotting, computational linguistics, and bioinformatics. For instance, in speech-to-text (speech recognition), the acoustic signal is the observed sequence, and a string of text is the "hidden cause" of that signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal.
History
[edit]The Viterbi algorithm is named after Andrew Viterbi, who proposed it in 1967 as a decoding algorithm for convolutional codes over noisy digital communication links.[2] It has, however, a history of multiple invention, with at least seven independent discoveries, including those by Viterbi, Needleman and Wunsch, and Wagner and Fischer.[3] It was introduced to natural language processing as a method of part-of-speech tagging as early as 1987.
Viterbi path and Viterbi algorithm have become standard terms for the application of dynamic programming algorithms to maximization problems involving probabilities.[3] For example, in statistical parsing a dynamic programming algorithm can be used to discover the single most likely context-free derivation (parse) of a string, which is commonly called the "Viterbi parse".[4][5][6] Another application is in target tracking, where the track is computed that assigns a maximum likelihood to a sequence of observations.[7]
Algorithm
[edit]Given a hidden Markov model with a set of hidden states and a sequence of observations , the Viterbi algorithm finds the most likely sequence of states that could have produced those observations. At each time step , the algorithm solves the subproblem where only the observations up to are considered.
Two matrices of size are constructed:
- contains the maximum probability of ending up at state at observation , out of all possible sequences of states leading up to it.
- tracks the previous state that was used before in this maximum probability state sequence.
Let and be the initial and transition probabilities respectively, and let be the probability of observing at state . Then the values of are given by the recurrence relation[8] The formula for is identical for , except that is replaced with , and . The Viterbi path can be found by selecting the maximum of at the final timestep, and following in reverse.
Pseudocode
[edit]function Viterbi(states, init, trans, emit, obs) is
input states: S hidden states
input init: initial probabilities of each state
input trans: S × S transition matrix
input emit: S × O emission matrix
input obs: sequence of T observations
prob ← T × S matrix of zeroes
prev ← empty T × S matrix
for each state s in states do
prob[0][s] = init[s] * emit[s][obs[0]]
for t = 1 to T - 1 inclusive do // t = 0 has been dealt with already
for each state s in states do
for each state r in states do
new_prob ← prob[t - 1][r] * trans[r][s] * emit[s][obs[t]]
if new_prob > prob[t][s] then
prob[t][s] ← new_prob
prev[t][s] ← r
path ← empty array of length T
path[T - 1] ← the state s with maximum prob[T - 1][s]
for t = T - 2 to 0 inclusive do
path[t] ← prev[t + 1][path[t + 1]]
return path
end
The time complexity of the algorithm is . If it is known which state transitions have non-zero probability, an improved bound can be found by iterating over only those which link to in the inner loop. Then using amortized analysis one can show that the complexity is , where is the number of edges in the graph, i.e. the number of non-zero entries in the transition matrix.
Example
[edit]A doctor wishes to determine whether patients are healthy or have a fever. The only information the doctor can obtain is by asking patients how they feel. The patients may report that they either feel normal, dizzy, or cold.
It is believed that the health condition of the patients operates as a discrete Markov chain. There are two states, "healthy" and "fever", but the doctor cannot observe them directly; they are hidden from the doctor. On each day, the chance that a patient tells the doctor "I feel normal", "I feel cold", or "I feel dizzy", depends only on the patient's health condition on that day.
The observations (normal, cold, dizzy) along with the hidden states (healthy, fever) form a hidden Markov model (HMM). From past experience, the probabilities of this model have been estimated as:
init = {"Healthy": 0.6, "Fever": 0.4}
trans = {
"Healthy": {"Healthy": 0.7, "Fever": 0.3},
"Fever": {"Healthy": 0.4, "Fever": 0.6},
}
emit = {
"Healthy": {"normal": 0.5, "cold": 0.4, "dizzy": 0.1},
"Fever": {"normal": 0.1, "cold": 0.3, "dizzy": 0.6},
}
In this code, init represents the doctor's belief about how likely the patient is to be healthy initially. Note that the particular probability distribution used here is not the equilibrium one, which would be {'Healthy': 0.57, 'Fever': 0.43} according to the transition probabilities. The transition probabilities trans represent the change of health condition in the underlying Markov chain. In this example, a patient who is healthy today has only a 30% chance of having a fever tomorrow. The emission probabilities emit represent how likely each possible observation (normal, cold, or dizzy) is, given the underlying condition (healthy or fever). A patient who is healthy has a 50% chance of feeling normal; one who has a fever has a 60% chance of feeling dizzy.

A particular patient visits three days in a row, and reports feeling normal on the first day, cold on the second day, and dizzy on the third day.
Firstly, the probabilities of being healthy or having a fever on the first day are calculated. The probability that a patient will be healthy on the first day and report feeling normal is . Similarly, the probability that a patient will have a fever on the first day and report feeling normal is .
The probabilities for each of the following days can be calculated from the previous day directly. For example, the highest chance of being healthy on the second day and reporting to be cold, following reporting being normal on the first day, is the maximum of and . This suggests it is more likely that the patient was healthy for both of those days, rather than having a fever and recovering.
The rest of the probabilities are summarised in the following table:
| Day | 1 | 2 | 3 |
|---|---|---|---|
| Observation | Normal | Cold | Dizzy |
| Healthy | 0.3 | 0.084 | 0.00588 |
| Fever | 0.04 | 0.027 | 0.01512 |
From the table, it can be seen that the patient most likely had a fever on the third day. Furthermore, there exists a sequence of states ending on "fever", of which the probability of producing the given observations is 0.01512. This sequence is precisely (healthy, healthy, fever), which can be found be tracing back which states were used when calculating the maxima (which happens to be the best guess from each day but will not always be). In other words, given the observed activities, the patient was most likely to have been healthy on the first day and also on the second day (despite feeling cold that day), and only to have contracted a fever on the third day.
The operation of Viterbi's algorithm can be visualized by means of a trellis diagram. The Viterbi path is essentially the shortest path through this trellis.
Extensions
[edit]A generalization of the Viterbi algorithm, termed the max-sum algorithm (or max-product algorithm) can be used to find the most likely assignment of all or some subset of latent variables in a large number of graphical models, e.g. Bayesian networks, Markov random fields and conditional random fields. The latent variables need, in general, to be connected in a way somewhat similar to a hidden Markov model (HMM), with a limited number of connections between variables and some type of linear structure among the variables. The general algorithm involves message passing and is substantially similar to the belief propagation algorithm (which is the generalization of the forward-backward algorithm).
With an algorithm called iterative Viterbi decoding, one can find the subsequence of an observation that matches best (on average) to a given hidden Markov model. This algorithm is proposed by Qi Wang et al. to deal with turbo code.[9] Iterative Viterbi decoding works by iteratively invoking a modified Viterbi algorithm, reestimating the score for a filler until convergence.
An alternative algorithm, the Lazy Viterbi algorithm, has been proposed.[10] For many applications of practical interest, under reasonable noise conditions, the lazy decoder (using Lazy Viterbi algorithm) is much faster than the original Viterbi decoder (using Viterbi algorithm). While the original Viterbi algorithm calculates every node in the trellis of possible outcomes, the Lazy Viterbi algorithm maintains a prioritized list of nodes to evaluate in order, and the number of calculations required is typically fewer (and never more) than the ordinary Viterbi algorithm for the same result. However, it is not so easy[clarification needed] to parallelize in hardware.
Soft output Viterbi algorithm
[edit]The soft output Viterbi algorithm (SOVA) is a variant of the classical Viterbi algorithm.
SOVA differs from the classical Viterbi algorithm in that it uses a modified path metric which takes into account the a priori probabilities of the input symbols, and produces a soft output indicating the reliability of the decision.
The first step in the SOVA is the selection of the survivor path, passing through one unique node at each time instant, t. Since each node has 2 branches converging at it (with one branch being chosen to form the Survivor Path, and the other being discarded), the difference in the branch metrics (or cost) between the chosen and discarded branches indicate the amount of error in the choice.
This cost is accumulated over the entire sliding window (usually equals at least five constraint lengths), to indicate the soft output measure of reliability of the hard bit decision of the Viterbi algorithm.
See also
[edit]References
[edit]- ^ Xavier Anguera et al., "Speaker Diarization: A Review of Recent Research" Archived 2016-05-12 at the Wayback Machine, retrieved 19. August 2010, IEEE TASLP
- ^ 29 Apr 2005, G. David Forney Jr: The Viterbi Algorithm: A Personal History
- ^ a b Daniel Jurafsky; James H. Martin. Speech and Language Processing. Pearson Education International. p. 246.
- ^ Schmid, Helmut (2004). Efficient parsing of highly ambiguous context-free grammars with bit vectors (PDF). Proc. 20th Int'l Conf. on Computational Linguistics (COLING). doi:10.3115/1220355.1220379.
- ^ Klein, Dan; Manning, Christopher D. (2003). A* parsing: fast exact Viterbi parse selection (PDF). Proc. 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL). pp. 40–47. doi:10.3115/1073445.1073461.
- ^ Stanke, M.; Keller, O.; Gunduz, I.; Hayes, A.; Waack, S.; Morgenstern, B. (2006). "AUGUSTUS: Ab initio prediction of alternative transcripts". Nucleic Acids Research. 34 (Web Server issue): W435 – W439. doi:10.1093/nar/gkl200. PMC 1538822. PMID 16845043.
- ^ Quach, T.; Farooq, M. (1994). "Maximum Likelihood Track Formation with the Viterbi Algorithm". Proceedings of 33rd IEEE Conference on Decision and Control. Vol. 1. pp. 271–276. doi:10.1109/CDC.1994.410918.
{{cite conference}}: CS1 maint: multiple names: authors list (link) - ^ Xing E, slide 11.
- ^ Qi Wang; Lei Wei; Rodney A. Kennedy (2002). "Iterative Viterbi Decoding, Trellis Shaping, and Multilevel Structure for High-Rate Parity-Concatenated TCM". IEEE Transactions on Communications. 50: 48–55. doi:10.1109/26.975743.
- ^ A fast maximum-likelihood decoder for convolutional codes (PDF). Vehicular Technology Conference. December 2002. pp. 371–375. doi:10.1109/VETECF.2002.1040367.
General references
[edit]- Viterbi AJ (April 1967). "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm". IEEE Transactions on Information Theory. 13 (2): 260–269. doi:10.1109/TIT.1967.1054010. (note: the Viterbi decoding algorithm is described in section IV.) Subscription required.
- Feldman J, Abou-Faycal I, Frigo M (2002). "A fast maximum-likelihood decoder for convolutional codes". Proceedings IEEE 56th Vehicular Technology Conference. Vol. 1. pp. 371–375. CiteSeerX 10.1.1.114.1314. doi:10.1109/VETECF.2002.1040367. ISBN 978-0-7803-7467-6. S2CID 9783963.
- Forney GD (March 1973). "The Viterbi algorithm". Proceedings of the IEEE. 61 (3): 268–278. doi:10.1109/PROC.1973.9030. Subscription required.
- Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 16.2. Viterbi Decoding". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8. Archived from the original on 2011-08-11. Retrieved 2011-08-17.
- Rabiner LR (February 1989). "A tutorial on hidden Markov models and selected applications in speech recognition". Proceedings of the IEEE. 77 (2): 257–286. CiteSeerX 10.1.1.381.3454. doi:10.1109/5.18626. S2CID 13618539. (Describes the forward algorithm and Viterbi algorithm for HMMs).
- Shinghal, R. and Godfried T. Toussaint, "Experiments in text recognition with the modified Viterbi algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-l, April 1979, pp. 184–193.
- Shinghal, R. and Godfried T. Toussaint, "The sensitivity of the modified Viterbi algorithm to the source statistics," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-2, March 1980, pp. 181–185.
External links
[edit]- Implementations in Java, F#, Clojure, C# on Wikibooks
- Tutorial on convolutional coding with viterbi decoding, by Chip Fleming
- A tutorial for a Hidden Markov Model toolkit (implemented in C) that contains a description of the Viterbi algorithm
- Viterbi algorithm by Dr. Andrew J. Viterbi (scholarpedia.org).
Implementations
[edit]- Mathematica has an implementation as part of its support for stochastic processes
- Susa signal processing framework provides the C++ implementation for Forward error correction codes and channel equalization here.
- C++
- C#
- Java Archived 2014-05-04 at the Wayback Machine
- Java 8
- Julia (HMMBase.jl)
- Perl
- Prolog Archived 2012-05-02 at the Wayback Machine
- Haskell
- Go
- SFIHMM[permanent dead link] includes code for Viterbi decoding.
Viterbi algorithm
View on GrokipediaIntroduction and Background
Overview
The Viterbi algorithm is a dynamic programming algorithm designed to determine the most likely sequence of hidden states, referred to as the Viterbi path, in a Hidden Markov Model (HMM) given an observed sequence.[9] This approach addresses the challenge of decoding by identifying the path that maximizes the joint probability of the hidden states and the corresponding observations.[9] At its core, the algorithm employs a trellis structure to systematically explore possible state transitions, pruning unlikely paths to avoid the computational explosion of exhaustive enumeration.[3] This method ensures an optimal solution without evaluating every conceivable sequence, providing a balance between accuracy and feasibility in probabilistic modeling tasks.[9] A primary advantage of the Viterbi algorithm is its computational efficiency, with a time complexity of O(T N²), where T denotes the length of the observation sequence and N the number of possible states, significantly outperforming brute-force alternatives that scale exponentially.[9] Originally developed for error-correcting in communication systems, it was introduced by Andrew Viterbi in 1967.[3]Hidden Markov Models
A hidden Markov model (HMM) is a statistical model that represents a system as a Markov chain where the states are hidden from observation, and only emissions dependent on those states are directly observable. The model consists of a finite set of hidden states , where is the number of states, and a sequence of observations drawn from an observation alphabet , with possible symbols. The HMM is fully specified by three sets of parameters: the state transition probability matrix , where for and denoting the state at time ; the emission (or observation) probability distribution , where for and ; and the initial state probability distribution , where for . Collectively, these parameters are denoted as .[10] The HMM relies on two key assumptions. First, the first-order Markov property for the hidden states, which states that the probability of transitioning to the next state depends only on the current state: . Second, the observations are conditionally independent given the state sequence, meaning that each observation depends solely on the current state and not on previous or future observations or states: . These assumptions simplify the modeling of sequential data where direct state information is unavailable.[10] Given a state sequence and observation sequence , the joint probability under the model is which factors according to the Markov and independence assumptions. Common notations in the literature include uppercase letters for random variables (e.g., for the state at time ) and lowercase for realizations (e.g., ), with the model encapsulating all probabilistic dependencies. While the standard formulation assumes discrete emissions, extensions to continuous observations replace the discrete with continuous probability density functions, such as finite mixtures of Gaussians, to handle real-valued data like acoustic features in speech recognition.[10] The Viterbi algorithm finds the most likely state sequence for decoding in HMMs.[10]Historical Development
Origins and Invention
The Viterbi algorithm was invented by Andrew J. Viterbi in 1967 while he was a faculty member in the School of Engineering and Applied Science at the University of California, Los Angeles (UCLA).[11] Originally developed as a method for maximum-likelihood decoding of convolutional codes transmitted over noisy digital communication channels, it addressed the need for computationally efficient error correction in bandwidth-limited systems.[12] Viterbi, an Italian-American electrical engineer, formulated the algorithm during his research on coding theory, drawing on principles of dynamic programming to find the most probable sequence of code symbols given a received signal corrupted by noise.[13] The algorithm's foundational ideas were detailed in Viterbi's seminal paper, "Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm," published in the IEEE Transactions on Information Theory in April 1967.[12] In this work, Viterbi not only introduced the decoding procedure but also derived asymptotic error bounds for convolutional codes, demonstrating that the algorithm achieves near-optimal performance as signal-to-noise ratios improve.[3] The motivation stemmed from pressing challenges in space communications during the 1960s, where missions to planets like Venus and Mars required robust error-correcting codes to combat high noise levels in deep-space channels, yet traditional sequential decoding methods demanded excessive computational resources impractical for real-time ground station processing.[14] This need was particularly acute for NASA's early planetary explorations, which relied on convolutional encoding but lacked efficient decoders until Viterbi's innovation.[15] Early recognition of the algorithm's potential came swiftly within the aerospace community. By the late 1960s, prototypes based on the Viterbi decoder were developed under NASA contracts, enabling practical implementations for satellite and deep-space telemetry.[13] In the 1970s, NASA adopted Viterbi decoding for key missions, including the Voyager spacecraft launched in 1977, which used a rate-1/2, constraint-length-7 convolutional code decoded via the algorithm to achieve reliable data recovery from billions of miles away.[15] This integration extended to international standards, with the Consultative Committee for Space Data Systems (CCSDS) incorporating Viterbi-based convolutional coding into its recommendations for deep-space telemetry by the early 1980s, building on NASA's prior implementations.[16]Key Milestones
In the 1970s, the Viterbi algorithm gained traction in digital communications following its formalization through the trellis structure introduced by G. David Forney in 1973, which provided a graphical representation that simplified implementation and analysis for convolutional code decoding.[17] This advancement enabled efficient hardware realizations and contributed to its adoption in early satellite and spacecraft systems, such as those developed by NASA and military applications, marking its transition from theory to practical use in noisy channels.[13] During the 1980s, the algorithm expanded into speech recognition, notably integrated into IBM's Tangora system, a speaker-dependent isolated-utterance recognizer that scaled to 20,000-word vocabularies using hidden Markov models (HMMs) for real-time processing.[18] Early applications also emerged in bioinformatics for sequence analysis, leveraging HMMs to model probabilistic alignments in biological data.[19] Additionally, ideas for parallelizing the algorithm to suit hardware constraints were explored, as in J.K. Wolf's 1978 work on efficient decoding architectures, paving the way for high-throughput implementations. In the 1990s and 2000s, the Viterbi algorithm became standardized in GSM mobile networks for channel decoding of convolutional codes, underpinning error correction in second-generation cellular systems and enabling reliable voice and data transmission worldwide.[13] It also found application in GPS signal processing, where it decodes the convolutional encoding of navigation messages to improve accuracy in low-signal environments.[20] Open-source tools further democratized its use, such as the Hidden Markov Model Toolkit (HTK) released in 1995, which incorporated Viterbi decoding for HMM training and sequence inference in speech and beyond.[21] Post-2010 developments have extended the algorithm to quantum computing for error correction, including quantum variants applied to quantum low-density parity-check (qLDPC) codes as surveyed in 2015, enhancing fault-tolerant quantum information processing.[22] Hybrid approaches integrating neural networks with Viterbi decoding have also proliferated in AI, such as convolutional neural network-HMM systems for improved sequence recognition since the early 2010s.[23]Core Algorithm
Description
The Viterbi algorithm employs a trellis diagram as its graphical foundation, representing the Hidden Markov Model (HMM) over time steps to on the horizontal axis and the possible hidden states on the vertical axis, with edges between states at consecutive time steps weighted by the product of transition probabilities and emission probabilities .[10] The algorithm proceeds via dynamic programming to compute the most likely state sequence, denoted as the Viterbi path, that maximizes the joint probability of the observed sequence and the hidden states given the HMM parameters.[10] The process begins with initialization at time : for each state to , set the Viterbi probability , where is the initial state probability, and initialize the backpointer .[10] This step establishes the probability of starting in each state and emitting the first observation . In the recursion phase, for each time step to and each state to , with the corresponding backpointer [10] These recursions propagate the maximum probability paths forward through the trellis, selecting at each node the predecessor state that yields the highest probability up to time , scaled by the emission probability for observation . To mitigate numerical underflow from repeated multiplications of small probabilities, a common variant computes in the log-probability domain, replacing products with sums and using .[1] At termination, after processing all observations, the maximum path probability is , and the final state is .[10] The optimal state sequence, or Viterbi path, is then recovered via backtracking: for down to , set .[10] This yields the complete sequence that maximizes the probability. The algorithm's optimality follows from the dynamic programming principle applied to the acyclic trellis graph: the maximum-probability path to any node at time is the maximum over all incoming paths from time , ensuring global optimality without exhaustive search.[3]Pseudocode
The Viterbi algorithm for Hidden Markov Models (HMMs) can be implemented using dynamic programming to compute the most likely state sequence given an observation sequence. The algorithm maintains a trellis of probabilities and backpointers to track the optimal path. The inputs to the algorithm are an observation sequence , where each is a discrete observation symbol, and the HMM model parameters , consisting of the state transition probability matrix (where ), the observation emission probability matrix (where for observation symbols ), and the initial state probability distribution (where ). The output is the most likely state sequence that maximizes . This formulation assumes a discrete emission HMM, as originally applied in contexts like speech recognition.[9] The following pseudocode outlines the core procedure, assuming hidden states and using a 2D array to store the Viterbi probabilities (the probability of the most likely path ending in state at time ) and a corresponding array to record the previous state for path reconstruction. Initialization sets the probabilities for the first observation, recursion computes paths for subsequent observations by maximizing over previous states, termination identifies the best ending state, and backtracking reconstructs the full path.[9]function Viterbi(O, λ = (A, B, π)):
T ← length(O)
N ← number of states
// Initialization
for i = 1 to N:
V[1][i] ← π_i * b_i(O_1)
backpointer[1][i] ← 0 // No previous state
// [Recursion](/page/Recursion)
for t = 2 to T:
for j = 1 to N:
temp ← -∞
argmax_i ← 0
for i = 1 to N:
prob ← V[t-1][i] * a_{i j}
if prob > temp:
temp ← prob
argmax_i ← i
V[t][j] ← temp * b_j(O_t)
backpointer[t][j] ← argmax_i
// Termination
bestpathprob ← max_{i=1 to N} V[T][i]
bestpathendstate ← argmax_{i=1 to N} V[T][i]
// Path backtracking
Q ← [array](/page/Array) of length T
Q[T] ← bestpathendstate
for t = T-1 downto 1:
Q[t] ← backpointer[t+1][Q[t+1]]
return Q
function Viterbi(O, λ = (A, B, π)):
T ← length(O)
N ← number of states
// Initialization
for i = 1 to N:
V[1][i] ← π_i * b_i(O_1)
backpointer[1][i] ← 0 // No previous state
// [Recursion](/page/Recursion)
for t = 2 to T:
for j = 1 to N:
temp ← -∞
argmax_i ← 0
for i = 1 to N:
prob ← V[t-1][i] * a_{i j}
if prob > temp:
temp ← prob
argmax_i ← i
V[t][j] ← temp * b_j(O_t)
backpointer[t][j] ← argmax_i
// Termination
bestpathprob ← max_{i=1 to N} V[T][i]
bestpathendstate ← argmax_{i=1 to N} V[T][i]
// Path backtracking
Q ← [array](/page/Array) of length T
Q[T] ← bestpathendstate
for t = T-1 downto 1:
Q[t] ← backpointer[t+1][Q[t+1]]
return Q
Worked Examples
Convolutional Code Decoding
Convolutional codes are linear time-invariant error-correcting codes generated by a finite-state shift register, where the output is a linear combination of the input bits and the contents of the register, defined by generator polynomials. A simple rate-1/2 convolutional code with constraint length 3 (memory of 2 bits) uses generator polynomials and , producing two output bits for each input bit through modulo-2 addition in the shift register. This code has a 4-state trellis, with states representing the content of the two memory elements: 00, 01, 10, and 11. The transmission occurs over a binary symmetric channel (BSC) with crossover probability , where each transmitted bit is independently flipped with probability , resulting in the received sequence being a noisy version of the transmitted codeword with possible bit errors. The Viterbi algorithm decodes by finding the most likely transmitted sequence given the received bits, using branch metrics based on Hamming distance for hard-decision decoding in the BSC model. Consider an example with the 4-state trellis for the rate-1/2 code. The input bit sequence (with terminating zero) is encoded to the codeword . The received sequence is , which differs from in one bit position (the last pair has a single flip from 01 to 11), corresponding to one error in the BSC. The trellis branches are labeled with the input bit and the corresponding output pair; for instance, transitions from each state split into two branches (for input 0 or 1), with outputs determined by the generator polynomials. The following table summarizes the branch labels for the states (state = s1 s2, outputs v1 = u ⊕ s2, v2 = u ⊕ s1 ⊕ s2; next state = u s1):| Current State | Input u | Output | Next State |
|---|---|---|---|
| 00 | 0 | 00 | 00 |
| 00 | 1 | 11 | 10 |
| 01 | 0 | 11 | 00 |
| 01 | 1 | 00 | 10 |
| 10 | 0 | 01 | 01 |
| 10 | 1 | 10 | 11 |
| 11 | 0 | 10 | 11 |
| 11 | 1 | 01 | 01 |
- Input 0 to state 00: expected 00, metric 2;
- Input 1 to state 10: expected 11, metric 0; Other states have infinite metrics initially. Survivor pointers point to the initializing paths.
- To state 00: from 00 (input 0, expected 00 vs 01, metric 1) total 2 + 1 = 3; no other predecessor. , pointer from 00.
- To state 01: from 10 (input 0, expected 01 vs 01, metric 0) total 0 + 0 = 0. , pointer from 10.
- To state 10: from 00 (input 1, expected 11 vs 01, metric 1) total 2 + 1 = 3. , pointer from 00.
- To state 11: from 10 (input 1, expected 10 vs 01, metric 2) total 0 + 2 = 2. , pointer from 10.
