Thanks for reading. Would be lovely if somebody could help me out on this (but not just post the answer) but also how you got there. I'm a programmer and I've ran across this problem which I can't solve. Nobody taught me mathematical probability so I really only know the self-taught basics, please keep that in mind.
What is the probability of finding a binary word of length N within another binary word of length M where of course M $\geq$ N?
Under the assumption that N = 1, I've came up with this solution $p = 1-(\frac{1}{ 2})^M$ .. yeah very impressive.
How do I incoorporate the length of the first word N into this or do I have a completely incorrect approach?
[Criteria]
Lets assume that X can be either 1 or 0 then XX has $2^2=4$ combinations {(0,0), (0,1), (1,0), (1,1)}. The probability of finding either 1 or 0 in this word would be 0.75 (so it doesn't matter if it's found in the first or the second character). The probability of finding 11 for example would be 0.25
What matters is that the first word has to exist continually within the second. For instance we match 11 with a word of length 3 (XXX) with 8 combinations, matches would be at (011) (110) and (111) so the probability would be 0.375.
However match(11, 101) this is not allowed due to the zero in between, although (1, 101) is allowed.
Hope I've explained this clear enough without knowing the adequate terminology. P.S. since N > M makes no sense I assume a fraction to appear somewhere by sheer intuition.
Sorry but I found that the solution presented here isn't correct. I did try to simulate and confirm the formula proposed by user @Logophobic using random trials of a M bits vector.
Here what I did:
I found that simulated results are not exact but have a good precision. For M>30 it would need more trials to avoid larger errors.
I also verified some results by direct brute force all possible 5 bits pattern occurences inside a M bit vector to verify my simulation for M<=15.
Simulation results
where $K=10^6$ is the number of trials used in each simulation. The probability is simply $p= HitCount * (\frac{1}{2})^K$, where $HitCount$ is the number of times that a pattern is found in a vector.
Formula results
(obtained with user @Logophobic formula)
$1-(1-\frac{1}{2^N})^{M-N+1}$, with $N=5$
So I would like to ask if someone has a formula that matches my simulation results (as I am confident that it is the correct answer for N=5). I also compared my simulations with some brute force simulation to try all possible combinations for M <=18 (larger values are impraticable to simulate).
Brute force
To confirm this results $prob = HitCount*p^M$ where $p=1/2$ (probability of a single bit is uniform).
All simulations are made considering N=5.
Some graphs:
