I've been trying to work out a formula for the probability of a specified sequence of upper-case English characters of length $n$ appearing in a random sequence of upper-case English characters of length $X$ (which is $\ge n$).
I thought I had it with:
$$\frac{X-(n-1))}{26^n}$$
But that doesn't work for $n=1$.
If you can provide an answer, I'd very much appreciate you also showing (as far as you can) how you arrived at it, so I can try to understand what is, at the moment, beyond my own math.
It depends on how well the string can overlap with itself. You can model this with a Markov chain, with states (Start) and the possible prefixes of your target string. Thus for ABA the states are (Start), A, AB and ABA, and the transition matrix is $$ P = \pmatrix{25/26 & 1/26 & 0 & 0\cr 24/26 & 1/26 & 1/26 & 0\cr 25/26 & 0 & 0 & 1/26\cr 0 & 0 & 0 & 1\cr}$$ where e.g. the second row entries mean that in state A, if the next character is A you stay in state A, if it is B you go to state AB, and anything else returns you to (Start). The probability you're looking for is the probability, starting in the (Start) state, of absorption by round $n$, which is $ ( P^n)_{1,4}$.
The characteristic polynomial is $C(\lambda) = { { \left( \lambda-1 \right) \left({\lambda}^{3}- {\lambda}^{2}+26/17576\,\lambda-25/17576 \right) }} $. If the matrix is diagonalized as $S \Lambda S^{-1}$ where $\Lambda$ is a diagonal matrix with diagonal entries $\lambda_i$ (the roots of $C(\lambda)$), then $$(P^n)_{1,4} = \sum_{i=1}^4 S_{1,i} S_{i,4} \lambda_i^n$$.