Index of coincidence computation.

207 Views Asked by At

I have the following definition for the index of coincidence for breaking Vignère cipher where $n_c$ is the number of indices $i$ for which $x_i = c$. enter image description here

I don't really know how to deduce the right-hand side exprexion. I have some notes from class but they are not complete something like:

$Pr(x_I = x_J : I < J ) = \sum_{c \in Z} Pr(x_I = x_J =c : I <J) = \sum_{c \in Z} \frac{{n_c} \choose {2}}{\frac{n}{c}}$.

Do you know the deduction of this expression?

2

There are 2 best solutions below

0
On BEST ANSWER

$$\Pr(x_I = x_J : I < J ) ~=~ \sum_{c \in Z} Pr(x_I = x_J =c : I <J) ~=~ \sum_{c \in Z} \frac{{n_c} \choose {2}}{{n}\choose{2}} ~=~ \sum_{c\in Z} \dfrac{n_c(n_c-1)}{n(n-1)}$$

By reason that: The probability that two distinct members in the sequence have the same value is determined by:

  • There are $n_c\choose 2$ ways to choose two of the $n_c$ members and assign them to indices $I,J$.; $n_c$ is the count of members in the sequence that have value of $c$.
  • There are $n\choose 2$ ways to choose 2 members from all $n$ members in the sequence.
  • Divide and sum over all $c$ that there are in $Z$; the values of members in the sequence.

That is all.

0
On

From Wikipedia, "The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text."

This is what the right-hand size of the expression looks like expanded:

Where c is the normalizing coefficient (26 for English), n_a is the number of times the letter "a" appears in the text, and N is the length of the text.

So, why does this work? From Wiki again, "The chance of drawing a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again (without replacement) is (appearances - 1 / text length - 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English."

Incidence of Coincidence Wiki Article