Bigram

BigramMain

Community hub

Bigram

7 pages, 0 posts

0 subscribers

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something

About hubMembersContent overviewUpdatesRules

Main reference articles

Bigram

View on Wikipedia

from Wikipedia

A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2.

The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, and speech recognition.

Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar).

Applications

[edit]

Bigrams, along with other n-grams, are used in most successful language models for speech recognition.^[1]

Bigram frequency attacks can be used in cryptography to solve cryptograms. See frequency analysis.

Bigram frequency is one approach to statistical language identification.

Some activities in logology or recreational linguistics involve bigrams. These include attempts to find English words beginning with every possible bigram,^[2] or words containing a string of repeated bigrams, such as logogogue.^[3]

Bigram frequency in the English language

[edit]

The frequency of the most common letter bigrams in a large English corpus is:^[4]

th 3.56%       of 1.17%       io 0.83%
he 3.07%       ed 1.17%       le 0.83%
in 2.43%       is 1.13%       ve 0.83%
er 2.05%       it 1.12%       co 0.79%
an 1.99%       al 1.09%       me 0.79%
re 1.85%       ar 1.07%       de 0.76%
on 1.76%       st 1.05%       hi 0.76%
at 1.49%       to 1.05%       ri 0.73%
en 1.45%       nt 1.04%       ro 0.73%
nd 1.35%       ng 0.95%       ic 0.70%
ti 1.34%       se 0.93%       ne 0.69%
es 1.34%       ha 0.93%       ea 0.69%
or 1.28%       as 0.87%       ra 0.69%
te 1.20%       ou 0.87%       ce 0.65%

References

[edit]

^ Collins, Michael John (1996-06-24). "A new statistical parser based on bigram lexical dependencies". Proceedings of the 34th annual meeting on Association for Computational Linguistics -. Association for Computational Linguistics. pp. 184–191. arXiv:cmp-lg/9605012. doi:10.3115/981863.981888. S2CID 12615602. Retrieved 2018-10-09.
^ Cohen, Philip M. (1975). "Initial Bigrams". Word Ways. 8 (2). Retrieved 11 September 2016.
^ Corbin, Kyle (1989). "Double, Triple, and Quadruple Bigrams". Word Ways. 22 (3). Retrieved 11 September 2016.
^ "English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU". norvig.com. Retrieved 2019-10-28.

Natural language processing

General terms

Text analysis

Text segmentation	Compound-term processing Lemmatisation Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation

Automatic summarization

Machine translation

Distributional semantics models

Language resources,
datasets and corpora

Types and standards	Corpus linguistics Lexical resource Linguistic Linked Open Data Machine-readable dictionary Parallel text PropBank Semantic network Simple Knowledge Organization System Speech corpus Text corpus Thesaurus (information retrieval) Treebank Universal Dependencies
Data	BabelNet Bank of English DBpedia FrameNet Google Ngram Viewer UBY WordNet Wikidata

Automatic identification
and data capture

Topic model

Computer-assisted
reviewing

Natural language
user interface

Revisions and contributors Edit on Wikipedia Read on Wikipedia

View on Grokipedia

from Grokipedia

A bigram, or 2-gram, is a contiguous sequence of two adjacent elements from a larger sequence of discrete symbols, such as characters, syllables, or words, serving as a fundamental unit in statistical analysis of sequential data. In natural language processing (NLP) and computational linguistics, bigrams are most commonly used to model the conditional probability of an element given its immediate predecessor, enabling simple probabilistic predictions in language models.^[1] The origins of bigrams trace back to Andrey Markov's pioneering 1913 work on Markov chains, where he analyzed letter transitions—effectively bigrams—in a 20,000-character excerpt from Alexander Pushkin's novel Eugene Onegin to demonstrate statistical dependencies in text. This approach was later adapted by Claude Shannon in his 1948 seminal paper on information theory, which employed word bigrams to estimate the entropy and predictability of English prose, laying groundwork for quantitative models of communication. Bigrams gained renewed prominence in the 1970s through applications in speech recognition, notably by Fred Jelinek's team at IBM and James Baker at Carnegie Mellon University, who trained large-scale n-gram models on corpora to improve automatic transcription accuracy.^[2]^[3]^[1] In modern usage, bigrams form the building blocks of n-gram language models, where probabilities are estimated from frequency counts in training corpora, such as P(word_i | word_{i-1}) = count(word_{i-1}, word_i) / count(word_{i-1}). These models power diverse applications, including autocomplete systems, grammar and spelling correction, machine translation, and early neural network training for text generation, though they are often augmented with smoothing techniques like Laplace or Kneser-Ney to handle unseen bigrams. Model performance is typically measured by perplexity, which quantifies how well the bigrams predict held-out text, with lower values indicating better approximation of true language distributions. Despite limitations in capturing long-range dependencies, bigrams remain computationally efficient and integral to hybrid systems combining statistical and deep learning methods.^[1]

Fundamentals

Definition

A bigram is a contiguous sequence of two adjacent elements drawn from a larger sequence, such as symbols, letters, or words in text or speech.^[1] This concept captures pairwise dependencies within the sequence, serving as a fundamental unit for analyzing patterns in linguistic data. For instance, in the word "the," the bigrams are "th" and "he"; in the sentence "the cat sat," a word-level bigram would be "the cat."^[1] The term "bigram" originates from the prefix "bi-" (meaning two) and "gram" (referring to a written symbol or unit), and it specifically denotes the n=2 case of the more general n-gram framework.^[1] The concept of such pairwise sequences was systematically explored by Andrey Markov in 1913 using letter transitions in text, and later applied by Claude Shannon in his 1951 work on estimating the entropy of printed English, where he used digram frequencies to approximate language redundancy.^[4]^[2] The broader term "n-gram" was introduced in the late 1950s within information theory, building on Shannon's foundations. Bigrams have since become essential in natural language processing for tasks like language modeling.^[1]

Mathematical Representation

In statistical natural language processing, a bigram is formally represented through counts derived from a corpus. The unigram count

C(w_i)

denotes the number of occurrences of a single word

w_i

in the corpus, while the bigram count

C(w_i, w_{i+1})

represents the frequency of the consecutive word pair

w_i

followed by

w_{i+1}

.^[5]^[1] The probability of a bigram is derived from conditional probability principles, approximating the likelihood of the next word given the previous one via maximum likelihood estimation. Specifically,

P(w_{i+1} \mid w_i) = \frac{C(w_i, w_{i+1})}{C(w_i)}

, which estimates the conditional probability as the relative frequency of the bigram divided by the unigram count of the preceding word.^[5]^[1] This formulation stems from the chain rule of probability, where the joint probability of a sequence is factored as

P(w_1, \dots, w_n) \approx \prod_{i=1}^{n-1} P(w_{i+1} \mid w_i)

, and each term is computed from empirical counts to maximize the likelihood of the observed data.^[5] Raw maximum likelihood estimates often fail due to sparse data, resulting in zero probabilities for unseen bigrams, which can lead to invalid models.^[1] To address this, smoothing techniques redistribute probability mass; a simple and widely used method is Laplace (add-one) smoothing, which adds a pseudocount of 1 to all bigram and unigram counts. The smoothed bigram probability is then given by

P(w_{i+1} \mid w_i) = \frac{C(w_i, w_{i+1}) + 1}{C(w_i) + V},

where

V

is the size of the vocabulary.^[5]^[1] This ensures that every possible bigram receives a small positive probability, preventing zero estimates while preserving the relative frequencies of observed events. Bigrams also play a central role in measuring the uncertainty of language models through entropy. The entropy

H

of a bigram model quantifies the average information content per word and is computed as the expected conditional entropy:

H = -\sum_{w_i} P(w_i) \sum_{w_{i+1}} P(w_{i+1} \mid w_i) \log_2 P(w_{i+1} \mid w_i).

This expression averages the negative log-probabilities over the unigram distribution

P(w_i)

and the conditional bigram probabilities, providing a theoretical measure of predictability.^[5]^[1] Lower entropy values indicate more predictable sequences, with bigram models typically yielding higher entropy than higher-order n-grams due to their limited context.^[1]

Types

Word Bigrams

Word bigrams consist of pairs of consecutive words extracted from tokenized text, serving as a fundamental unit for modeling local dependencies in natural language. The formation process begins with tokenization, where raw text is split into words using whitespace and punctuation as delimiters; punctuation marks are typically removed or treated as separate tokens to avoid fragmenting meaningful units, while the text is often converted to lowercase to normalize for case insensitivity and reduce vocabulary size.^[6] For instance, in the sentence "The quick brown fox jumps.", tokenization yields the sequence ["the", "quick", "brown", "fox", "jumps"], excluding the period as non-word content.^[1] From this tokenized sequence, bigrams are generated by sliding a window of size two across the tokens, producing pairs such as ("the", "quick"), ("quick", "brown"), ("brown", "fox"), and ("fox", "jumps"). To account for sentence boundaries, special tokens like ~~(start-of-sentence) and~~ (end-of-sentence) are prepended and appended, respectively, resulting in additional bigrams: (~~, "the") and ("jumps",~~ ). This step-by-step extraction ensures that bigrams capture the full contextual flow, including sentence-initial and terminal patterns, as standard in n-gram language modeling.^[1] A key property of word bigrams is their ability to model collocations—conventional word pairings that occur more frequently than expected by chance, such as "New York" treated as a fixed phrase rather than independent words. These structures highlight syntactic patterns, including part-of-speech transitions like adjective-noun sequences (e.g., "quick brown"), which aid in representing grammatical dependencies without full parsing.^[7] Challenges in word bigram formation include handling out-of-vocabulary (OOV) words, which are terms absent from the predefined corpus vocabulary; these are commonly replaced with a special token to maintain model continuity and prevent zero probabilities in downstream applications. Additionally, variations in tokenization across languages or domains can introduce inconsistencies, such as ambiguous punctuation attachment (e.g., "can't" as one token or "can" and "'t"), necessitating domain-specific preprocessing rules.^[1]

Character Bigrams

Character bigrams consist of pairs of consecutive characters extracted from untokenized text, typically after normalization such as converting to lowercase and removing or ignoring spaces and punctuation to focus on alphabetic sequences.^[8] For instance, the text "The quick" might yield bigrams like ("t","h"), ("h","e"), and ("q","u"), preserving the sequential order without word boundaries. This formation allows for straightforward computation over raw strings, enabling analysis of subword structures.^[9] These bigrams capture digraphs, which are common letter pairs in a language, such as "th" (occurring in about 1.52% of English digraph positions), "he" (1.28%), and "in" (0.94%), reflecting orthographic patterns tied to phonetics and spelling conventions.^[8] In English, digraphs like "th" often represent specific sounds (/θ/ or /ð/), aiding in the study of how letter combinations encode phonological information and influence spelling regularity. Their frequencies reveal linguistic redundancies, such as repeated pairs indicating morphological or phonetic constraints.^[10] Compared to word bigrams, character bigrams offer advantages in handling out-of-vocabulary (OOV) words and languages lacking explicit spaces, like Chinese, where they avoid the need for segmentation by treating continuous character streams as fixed-length units.^[11] This makes them robust for proper nouns, neologisms, or agglutinative scripts, as no dictionary or tokenizer is required, enhancing applicability in diverse NLP tasks.^[12] Consider the word "hello": after normalization, it produces the bigrams ("h","e"), ("e","l"), ("l","l"), and ("l","o"). The repeated ("l","l") highlights redundancy in English spelling, where such pairs underscore letter frequencies and potential for predictive modeling in text generation or error correction.^[9]

Other Variants

Phonetic bigrams, also known as phone bigrams, consist of pairs of adjacent phonetic units or phones, such as the sequence /k/ followed by /æ/ in the pronunciation of "cat." These bigrams model transitions between sounds in spoken language and are employed in speech recognition systems to capture phonetic dependencies and improve decoding accuracy.^[13]^[14] Byte bigrams refer to consecutive pairs of bytes in binary data, utilized in computing for tasks such as file type identification and data compression. In file analysis, byte bigram frequencies help classify fragmented or unknown files by generating feature vectors from their binary content, distinguishing formats like executables from documents.^[15] For compression, byte pair encoding (BPE) iteratively identifies and replaces the most frequent byte bigrams with a single byte symbol, reducing redundancy in text or binary streams as originally proposed for general-purpose data compression.^[16] Skip bigrams extend the bigram concept to non-adjacent pairs, such as (w_i, w_{i+2}), where one or more intervening elements are skipped, allowing capture of longer-range dependencies while contrasting with standard contiguous bigrams that require immediate adjacency. This variant is applied in natural language processing to enhance event prediction and statistical modeling by incorporating gapped sequences.^[17] In bioinformatics, domain-specific bigrams include DNA bigrams, which are pairs of adjacent nucleotides like "AT" or "CG" in genomic sequences, used to analyze patterns, classify DNA texts over the four-letter alphabet {A, C, G, T}, and extract features for predicting protein-DNA interactions. These bigrams facilitate motif detection and sequence similarity assessments in large-scale genomic studies.^[18]^[19]

Applications

Natural Language Processing

In natural language processing (NLP), bigrams serve as a foundational tool for language modeling, where they estimate the probability of a word given its immediate predecessor to predict the next word in a sequence. This approach, known as a bigram model, approximates the joint probability of word sequences by assuming Markovian dependence, enabling tasks like next-word prediction. A seminal exposition of bigram models highlights their use in calculating conditional probabilities

P(w_i | w_{i-1})

from training corpora, which underpins probabilistic text prediction.^[1] The effectiveness of bigram language models is often evaluated using perplexity, a metric that quantifies predictive uncertainty as

PP = 2^{H}

, where

H

represents the cross-entropy of the model with respect to the test data. Lower perplexity indicates better modeling of language patterns, with bigrams typically achieving reasonable scores on short-range dependencies in English text, though performance degrades on longer sequences.^[1] Bigrams find practical applications in spell-checking, where context-aware systems leverage bigram probabilities to rank correction candidates by their likelihood in surrounding text. In machine translation, bigram models contribute to language modeling and alignment by scoring co-occurrence probabilities. Additionally, bigrams support text generation by sampling subsequent words based on prior ones, producing coherent short phrases in early probabilistic systems.^[1] Despite their utility, bigram models exhibit limitations in capturing long-range dependencies, as they ignore context beyond the immediate prior word, often resulting in sparse probabilities and poor generalization for complex sentences; this has motivated extensions to higher-order n-grams like trigrams for improved modeling without delving into their specifics here.^[1] A notable case study involves bigram-based sentiment analysis, where phrase-level bigrams detect polarity shifts from negation or intensifiers. For example, the bigram "not good" conveys negative sentiment despite the positive valence of "good," whereas "very good" amplifies positivity, allowing models to classify reviews more accurately than unigram approaches by incorporating local word interactions.^[20]

Cryptanalysis

Bigrams play a crucial role in frequency analysis for cryptanalyzing monoalphabetic substitution ciphers, where the distribution of letter pairs in the ciphertext is compared to expected plaintext frequencies to map substitutions. For instance, the high frequency of the bigram "th" in English (occurring approximately 3.6% of the time) allows cryptanalysts to identify likely mappings for frequently appearing pairs in the encrypted text, facilitating partial or full recovery of the plaintext. This method extends single-letter frequency analysis by providing additional structural clues, making it more effective against ciphers that preserve bigram statistics despite letter scrambling.^[21] Historical applications of bigram-based techniques trace back to early Islamic cryptology, where Arab scholars like Al-Kindi in the 9th century pioneered frequency analysis for individual letters in his treatise A Manuscript on Deciphering Cryptographic Messages, laying the groundwork for later adaptations to bigrams that exploited pairwise letter dependencies in languages like Arabic. By the 19th century, these ideas evolved into more systematic approaches, such as the Kasiski examination published by Friedrich Kasiski in 1863, which detects the key length in polyalphabetic ciphers like the Vigenère by measuring distances between repeated bigrams in the ciphertext—for example, multiple occurrences of a bigram like "QU" separated by intervals that are factors of the key length reveal its periodicity.^[22]^[23] During World War II, bigram analysis contributed to Allied efforts in breaking Enigma machine variants, particularly naval versions where German operators used bigram substitution tables to encode message keys and settings, concealing them from direct frequency attacks; cryptanalysts at Bletchley Park exploited recovered bigram tables and known-plaintext cribs to streamline decryption processes.^[24]

Text Compression

Bigrams play a key role in lossless text compression by capturing dependencies between consecutive symbols, enabling probabilistic models that exploit redundancy to encode text more efficiently than independent symbol treatments. In arithmetic coding paired with bigram models, the probability of each subsequent character is estimated based on the preceding one, allowing the encoder to represent the entire message as a single fractional number within a [0,1) interval; narrower intervals for high-probability bigrams reduce the average bits required per character, approaching the Shannon entropy limit.^[25]^[26] A prominent application is Prediction by Partial Matching (PPM), an adaptive algorithm that incorporates bigram (order-1) contexts alongside higher orders, using partial string matching to predict symbols when exact contexts are unseen; this method dynamically updates probability estimates and employs arithmetic coding for entropy encoding, forming the basis for variants in compression software such as 7-Zip's PPMd filter.^[26] In PPM, bigram probabilities guide the selection of escape mechanisms to lower-order models, balancing adaptation and compression effectiveness.^[25] For instance, in compressing the sequence "the the," a bigram model recognizes the high probability of "the" following "the " (due to common repetitions in English), assigning it a short code interval in arithmetic coding and thus requiring fewer bits than encoding each word independently.^[25] Bigram models with arithmetic coding typically achieve around 1.5 bits per character for English text, a notable improvement over unigram Huffman coding's approximately 4.7 bits per character, as demonstrated in trained PPM variants on literary corpora. This performance underscores bigrams' utility in nearing English's estimated entropy of 1-1.3 bits per character without higher-order extensions.

Statistical Properties

Frequency Estimation

Frequency estimation for bigrams involves deriving probabilities from observed data in a corpus, addressing challenges such as data sparsity where many possible bigrams do not appear, leading to zero-probability assignments, and high variance in estimates for rare events.^[1] The maximum likelihood estimation (MLE) provides a baseline approach by computing the raw relative frequency of a bigram given its context. For a bigram

w_i w_{i+1}

, the MLE probability is given by

P_{\text{MLE}}(w_{i+1} \mid w_i) = \frac{C(w_i w_{i+1})}{C(w_i)},

where

C(\cdot)

denotes the count in the training corpus. This method maximizes the likelihood of the training data but suffers from high variance for bigrams with low counts and assigns zero probability to unseen bigrams, causing issues in applications requiring smooth probability distributions.^[1]^[27] To handle sparsity, particularly for unseen bigrams, the Good-Turing estimator adjusts counts based on the frequency of frequencies observed in the corpus. It estimates the probability mass for unseen events as

N_1 / N

, where

N_1

is the number of bigrams seen exactly once and

N

is the total number of bigrams. For seen bigrams occurring

r

times, the smoothed count is

r^* = (r + 1) \frac{N_{r+1}}{N_r}

, with

N_r

being the number of bigrams occurring

r

times; this redistributes probability from higher-frequency bigrams to lower or unseen ones, improving estimates for rare events without assuming a parametric form. An enhanced version incorporates predictors like expected independent frequencies to refine adjustments for different bins of unseen bigrams.^[28]^[27] Backoff and interpolation methods mitigate sparsity by blending bigram estimates with lower-order models, such as unigrams, when bigram counts are insufficient. A prominent interpolation technique is Jelinek-Mercer smoothing, which linearly combines the bigram MLE with the unigram probability:

P(w_{i+1} \mid w_i) = \lambda P_{\text{MLE}}(w_{i+1} \mid w_i) + (1 - \lambda) P(w_{i+1}),

where

\lambda

(between 0 and 1) controls the interpolation weight and is typically tuned on held-out data to maximize likelihood on a validation set. Backoff methods, in contrast, use the full bigram probability if the count exceeds a threshold, otherwise falling back to the lower-order model with a discount factor. These approaches ensure non-zero probabilities for all bigrams while preserving information from observed data.^[27] Evaluation of bigram frequency estimators focuses on their predictive performance on unseen data, commonly using log-likelihood (or cross-entropy) on a held-out test set, which measures how well the model assigns high probability to actual sequences. This is often reported as perplexity, the exponential of the average negative log-likelihood per word, where lower values indicate better estimation; for example, bigram models typically achieve perplexities around 100-200 on news corpora, compared to over 900 for unigrams. Goodness-of-fit can also be assessed via chi-squared tests, comparing observed bigram frequencies in test data against those expected under the estimated model to detect systematic deviations.^[1]^[27]

Bigram Frequencies in English

In English text, character bigrams display characteristic frequency distributions derived from large corpora, revealing patterns shaped by phonetic, morphological, and syntactic structures. Analysis of the Google Books English corpus (approximately 744 billion words from the 1800s to 2000s), as compiled by Peter Norvig, identifies "th" as the most prevalent bigram at 3.56%, reflecting its role in common function words and affixes like "the" and "-th". The second most frequent is "he" at 3.07%, often appearing in pronouns and articles. These frequencies underscore the dominance of a small set of bigrams in everyday language use.^[29] The following table summarizes the top 10 character bigrams from Norvig's analysis, with relative frequencies indicating their proportional occurrence among all possible pairs (normalized across alphabetic characters, excluding spaces and punctuation):

Rank	Bigram	Frequency (%)
1	th	3.56
2	he	3.07
3	in	2.43
4	er	2.05
5	an	1.99
6	re	1.85
7	on	1.76
8	at	1.49
9	en	1.45
10	nd	1.35

^[29] Word bigrams, sequences of adjacent words, exhibit even more skewed distributions due to grammatical constraints. In modern English sources like the Google Books N-grams dataset (covering up to 2019 and billions of words), common pairs such as "of the" and "in the" dominate, with "of the" accounting for roughly 0.35% of all bigram occurrences, highlighting prepositional and determiner patterns. Other frequent examples include "to the" (0.28%) and "and the" (0.25%), which together represent high co-occurrence in narrative and expository prose. These percentages vary slightly by subcorpus but establish the scale of functional word pairings.^[30]^[31] Bigram frequencies in English vary by genre, influenced by stylistic and structural differences. In fiction, bigrams show intermediate lexical gravity— a measure of associational strength—clustering closer to spoken language due to dialogue and narrative flow, whereas news texts exhibit lower gravity per sentence, reflecting longer, more varied structures with less formulaic pairing. For instance, fiction favors cohesive bigrams like "said he" more than news, which prioritizes informational pairs such as "the United". Academic writing displays the lowest overall gravity, emphasizing complex syntax over repetitive sequences.^[32] Over time, bigram frequencies have evolved, with shifts observable in informal texts via diachronic corpora. Google Books N-grams reveal gradual changes, such as a slight decline in traditional bigrams like "th" in informal registers (e.g., digital communication), attributed to phonetic simplifications and genre blending, though formal texts maintain stability. These variations underscore language's adaptability to cultural and technological contexts.^[30] Bigram ranks in English follow a power-law distribution akin to Zipf's law, where frequency decreases inversely with rank. Specifically, bigram probabilities adhere to

p(r) \propto r^{-\alpha}

, with

\alpha \approx 0.835

for English co-occurrences in large corpora, indicating a "second-order" Zipfian pattern flatter than unigrams (

\alpha \approx 1

). This relation, where rank

\times

frequency approximates a constant for high-frequency bigrams, highlights the hierarchical organization of linguistic units.^[33]

Info Pages

Talk Pages

Special Pages

Bigram

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Bigram

Applications

Bigram frequency in the English language

See also

References

Bigram

Fundamentals

Definition

Mathematical Representation

Types

Word Bigrams

Character Bigrams

Other Variants

Applications

Natural Language Processing

Cryptanalysis

Text Compression

Statistical Properties

Frequency Estimation

Bigram Frequencies in English

References

Add your contribution

Related Hubs

Contribute something

History

Bigram

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Bigram

Applications

Bigram frequency in the English language

See also

References

Bigram

Fundamentals

Definition

Mathematical Representation

Types

Word Bigrams

Character Bigrams

Other Variants

Applications

Natural Language Processing

Cryptanalysis

Text Compression

Statistical Properties

Frequency Estimation

Bigram Frequencies in English

References

Add your contribution

Related Hubs

Contribute something