Statistics/combinatorics question relating to Enigma rotors

88 Views Asked by At

I would be grateful of some advice on what I hope is a straightforward matter for mathematicians with a sound knowledge of statistics and probability/combinatorics.

Context

I am investigating Enigma rotors, which cause the 26 letters of the alphabet to become permuted between input and output. There are therefore 26! possible rotors. But what seems to be important in terms of the Enigma producing random permutations is that the number of permutation offsets is as large as possible. By permutation offset, I mean the difference in position between the output letter and the input letter. So for example, if A permutes to E, then the resulting difference is 5-1=4. If H permutes to D then the resulting difference is 4-8=-4=22mod26. So collectively there are 26 permutation offsets for an Enigma rotor. However, it can be shown that it is impossible for all 26 different offsets i.e. the set of integers from 0-25 inclusive to be present. So I am interested in estimating how many different rotors are possible with 25 different offsets.

Computationally, I don't think it is possible to assess all 26! permutations and keep a count of the ones that result in 25 offsets. So I have resorted to creating random permutations using the Python random.shuffle function. Over 5 billion random samples (which takes 12 hours to run on my admittedly old PC), I have measured anything between 48 and 75 of these permutations, so we are talking order of 10-15 per billion random permutations. So, these permutations are very rare, but in the context of 26!, there are also a lot of them (order 10^18)

So my question is: how many random samples would I need to test before I could be confident that my estimate of the frequency of these permutations is correct within +/-5%? What about +/-1%? I have tried dividing the sample space into smaller intervals eg of 100 million and then using the resulting data to produce a confidence interval. But I doubt that approach has any mathematical validity.

Any help or advice gratefully appreciated. I suspect this is a fairly generic problem in statistics, but I haven't been able to find an answer using Google.

1

There are 1 best solutions below

4
On

The 95% confidence interval for a binomial proportion estimate is $$ \hat p\pm 1.96\sqrt{\hat p(1-\hat p)/n}\\\approx \hat p\pm 1.96\sqrt{\hat p/n}, $$ where $\hat p$ is the sample probability that a permutation has twenty-five different offsets. I am using the approximation $1-\hat p\approx 1$.

If you want to be confident about the mean within 5% relative error, the half-width of this interval must be at most $0.05\hat p$. From $1.96\sqrt{\hat p/n}\le 0.05\hat p$, we get $$ n\gtrsim 1537/\hat p $$ Since your data suggests that $p\approx 10^{-8}$, this means you would require more than $1.5\times 10^{11}$ samples to get $5\%$ relative error. To get $1\%$ error, you would need to scale up that number by twenty-five.

In general, binomial confidence intervals are not feasible when estimating rare probabilities, unfortunately.