I'm wondering how to define the probability of a long string LS (using $26$ letters alphabet) to contain a smaller string ss.
Right know I have something like this.
Number of LS containing ss:
$26^{( length(LS) - length(ss) ) \times ( length(LS) - length(ss) )}$
the first term of the multiplications represents how many different strings can be assembled using the 26 letters alphabet with the part of LS which doesn't contain ss. Then, the second part is at how many places I can fit ss inside LS.
This expression obviously has some problems, as the number of sequences with $length(ss)=0$ is larger than the number of combinations $26^{length(LS)}$.
But I can't figure out how to fix it and I would appreciate to get a push on that matter.
Thanks
The answer is it depends in the sequence. Suppose we want to find the probability that the word 11 is inside a 3-letter word in an alphabet of 3 letters.There are 5 such words that contain it. Now lets look at the probability the word 12 is inside a 3-letter word. there are 6 words that contain it. Not all words have the same probability.
Now for a fixed word:
let w be your m-letter string in a k letter alphabet. Let $x_n$ be the number of words of lenth $n$ (also in x letter alphabet) containing w.
At first sight we would consider the following recurrence relation: $x_n=kx_{n-1}+k^{n-m}-x_{n-m}$ Since every word of length n can be created in the following way: for a n-1 word that contains w just add any of the k letters at the end $kx_{n-1}$. For a $k-n$length word just add w at the end to get one that does work. This seems really great. There's just one problem:
Sometimes when you add w at the end of a n-k word that doesn't contain w you get a word that contains w even if you remove some of the last digits. Let me explain: for example consider $w=1,2,1$ then the word $3,1,2$ clearly doesn't contain $w$ so we add w to this letter to get $3,1,2,1,2,1$ notice that this word is obtained through both the process of taking an (n-1) word that contains w and taking and n-k word that does not. Thus we are counting it twice? Can you refine the recursion so it works?