Question regarding monkeys and probabilities.

85 Views Asked by At

So this question is derived from problem 4 in chapter 2 of Kittel & Kroemer's Thermal Physics. To be upfront: I am confused about how to properly compute the probabilities from a canonical point of view. Given slightly different parameters, the answer I "derived" implies that I can have a probability greater than one.

So the question as presented is given $10^{10}$ monkeys randomly typing away on a typewriter with 44 keys (no capitals, just 26 letters and 18 special characters) at a rate of $10$ keys per second for $10^{18}$ seconds, what is the probability that a specific sequence of $10^5$ characters was typed by any one of the monkeys.

So the probability that a sequence of $10^5$ characters typed at random is the specific sequence (of monkey-Hamlet) is trivial: $$ P(\text{monkey-}\textit{Hamlet})=(1/44)^{10^5} $$

Since this is supposed to be a continuous sequence typed onto an infinite sheet of paper wherein the sequence of monkey-Hamlet can be found anywhere, then there are $N-n+1$ possible places to find the exact sequence of $n$ characters within a larger sequence of $N$ total characters. This is derived by considering a smaller sequence of, say, 6 characters and counting the possible ways of finding 2, 3, etc. of characters within the larger sequence. For 6 characters and a sub-sequence of length 4, there are 6-4+1=3 possible places to find the sub-sequence, the first four, the middle four, and the last four characters.

A single monkey types out $10*10^{18}=10^{19}$ total characters, and so there are $10^{19}-10^5+1$ places to find monkey-Hamlet on the sequence. So the probability of finding monkey-Hamlet is $$ P(\text{monkey-}\textit{Hamlet}\text{ for one monkey})=(1/44)^{10^5}*\left(10^{19}-10^5+1\right) $$ and for $10^{10}$ monkeys the above is multiplied again by $10^{10}$. This probability is still nearly zero, and so the a population of monkeys only slightly larger than the current human population randomly typing away for longer than the age of the universe will, with almost complete certainty, never type Hamlet.

This got me full credit for the problem, but I have an ongoing problem with my 'derivation'. If the monkeys typed faster, say instead of 10 characters per second but at a rate of approximately $44^{10^5}$ keys per second or more, then it would be possible to get a probability greater than one. Regardless of how fast or how long the monkeys type for, if the sequence we search for the randomly typed text is sufficiently large, the probability grows arbitrarily large.

So the repeating the question: How to do this calculation "properly"/formally?

1

There are 1 best solutions below

1
On BEST ANSWER

Consider the a string of characters of size N and a sub-string of size n. Then there are $N-n+1$ possible positions to find the substring. The probability that a string of size n is NOT in the first position is $1-p$, where $p$ is the probability of randomly typing the string. The probability of not typing the string in any of the positions is $(1-p)^{N-n+1}$.

Thus the probability of typing the string is $1-(1-p)^{N-n+1}$. This is for one monkey. The total number of positions for all monkeys (where $M$ number of monkeys) is $M(N-n+1)$ if they all type the same amount. Thus the probability is $$ P(\text{monkey-}\textit{Hamlet})=1-(1-p)^{M(N-n+1)} =1-\sum_{k=0}^{M(N-n+1)}\begin{pmatrix}M(N-n+1)\\k\end{pmatrix}(-p)^k $$

To get an approximation for $P$, simply take the binomial expansion to first order in $p$. This is a good approximation since $p\ll 1$, and so $p^k\approx0,\forall k\ge2$. $$ P(\text{monkey-}\textit{Hamlet})\simeq 1-\left[ 1(-p)^0 + M(N-n+1)(-p)^1 \right] = 1-1+M(N-n+1)p \\ = M(N-n+1)p $$ which is the quantity I computed in the question. Further, since the quantity $1-p$ is always strictly less than one, then the probability of $(1-p)^x$ for all $x$ is always less than one as well. Thus regardless of the size of the set of characters $N$, the quantity will always approach one. The only consideration to be made here is that for sufficiently large $N$ or $M$, higher order binomial expansions must be taken to properly compute $P$.