Index of coincidence

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

In cryptography, coincidence counting is the technique (invented by William F. Friedman) of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts. This count, either as a ratio of the total or normalized by dividing by the expected count for a random source model, is known as the index of coincidence, or IC or IOC or IoC for short.

Because letters in a natural language are not distributed evenly, the IC is higher for such texts than it would be for uniformly random text strings. What makes the IC especially useful is the fact that its value does not change if both texts are scrambled by the same single-alphabet substitution cipher, allowing a cryptanalyst to quickly detect that form of encryption.

The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text. The chance of drawing a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again (without replacement) is (appearances − 1 / text length − 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English.

where c is the normalizing coefficient (26 for English), n_a is the number of times the letter "a" appears in the text, and N is the length of the text.

We can express the index of coincidence IC for a given letter-frequency distribution as a summation:

where N is the length of the text and n₁ through n_c are the frequencies (as integers) of the c letters of the alphabet (c = 26 for monocase English). The sum of the n_i is necessarily N.

The products $n (n - 1)$ count the number of combinations of n elements taken two at a time. (Actually this counts each pair twice; the extra factors of 2 occur in both numerator and denominator of the formula and thus cancel out.) Each of the n_i occurrences of the i -th letter matches each of the remaining $n i - 1$ occurrences of the same letter. There are a total of $N (N - 1)$ letter pairs in the entire text, and 1/c is the probability of a match for each pair, assuming a uniform random distribution of the characters (the "null model"; see below). Thus, this formula gives the ratio of the total number of coincidences observed to the total number of coincidences that one would expect from the null model.

The expected average value for the IC can be computed from the relative letter frequencies $f i$ of the source language:

See all

Hub AI

Index of coincidence AI simulator

(@Index of coincidence_simulator)

Wikipedia

Grokipedia

Hub AI

Index of coincidence

where c is the normalizing coefficient (26 for English), n_a is the number of times the letter "a" appears in the text, and N is the length of the text.

We can express the index of coincidence IC for a given letter-frequency distribution as a summation:

where N is the length of the text and n₁ through n_c are the frequencies (as integers) of the c letters of the alphabet (c = 26 for monocase English). The sum of the n_i is necessarily N.

The expected average value for the IC can be computed from the relative letter frequencies $f i$ of the source language:

See all

Knowledge Base

Talk Channels

Special Pages

Index of coincidence

Index of coincidence

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Index of coincidence

Hub AI

Index of coincidence

History

Index of coincidence

Index of coincidence

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Index of coincidence

Hub AI

Index of coincidence