Suppose we generate "random strings" over an $m$-letter alphabet, and look for the first occurrence of $k$ consecutive identical digits. I was with some effort able to find that the random variable $X$, denoting the time until we see $k$ consecutive digits, has probability generating function
$$P(z) = \sum_{n \ge 0} \Pr(X=n)z^n = \frac{(m-z)z^k}{m^k(1-z) + (m-1)z^k}$$
This correctly gives the expected time until we see $k$ consecutive $1$s, as
$$\operatorname{E}[X] = P'(1) = \frac{m^k - 1}{m - 1}$$
(For example, in a random stream of decimal digits, we expect to see $10$ consecutive identical digits after an expected number of $111111111$ steps.)
Using Sage to compute the coefficients of this generating function for $m=10$ and various small $k$, I was able to find the exact (smallest) $N_1(m, k)$ for which $\Pr(X \le N_1) \ge \frac12$, and $N_2(m, k)$ for which $\Pr(X \le N_2) \ge \frac9{10}$ (i.e. by which time we can be "quite" sure of having seen $k$ consecutive digits):
- $2$ consecutive digits: half at $N_1 = 8$, "quite" at $N_2 = 23$.
- $3$ consecutive digits: $N_1 = 78$ and $N_2 = 252$.
- $4$ consecutive digits: $N_1 = 771$ and $N_2 = 2554$.
- $5$ consecutive digits: $N_1 = 7703$ and $N_2 = 25578$.
- $6$ consecutive digits: $N_1 = 77018$ and $N_2 = 255835$.
- [$7$ consecutive digits: hit limitations of my computer (or programming skills).]
There is clearly a pattern there, and I'd like to be able to calculate (either exactly or approximately) the value of $N_1(m, k)$ and $N_2(m, k)$ for larger values. Is there a technique that would give the (possibly asymptotic) values of $N_2(m, k)$ say?
Since the generating function is rational, we can say a few things. First, the sequence of probabilities satisfies a linear recurrence. Specifically, $$m^k p_n - m^k p_{n-1} + (m-1)p_{n-k} = 0,$$ which we may prefer to write as $$p_n = p_{n-1} - \frac{m-1}{m^k} p_{n-k}.$$ The linear recurrence is not memory intensive, but I don't think it does significantly better than Sage. My machine finds (via a short Python script) 770164 and 2558418 for "half" and "quite" for $m=10$ and $k = 7$ in about 8 seconds.
By Theorem 4.1.1 of Stanley's Enumerative Combinatorics (Volume 1), if the denominator of your generating function has no repeated roots, we expect $p_n$ to be asymptotic to $A\cdot \left(\frac{1}{\alpha}\right)^n$, where $\alpha$ has the smallest modulus among the roots (for some constant $A$). The sum of the first $n$ probabilities should be approximately $1 - C\cdot\left(\frac{1}{\alpha}\right)^n$ for some $C$. Upon finding $\alpha$ and $C$, these approximations will be much faster to compute than exact values.
It appears to be difficult to find a nice expression $\alpha$, however, $$\alpha = 1 + \frac{m-1}{m^k - k(m-1)}$$ is a reasonably good approximation.
Upon finding $C$ and $\alpha$, we have approximations for $N_1(m,k)$ and $N_2(m,k)$ by solving $1 - C(\frac{1}{\alpha})^n = p$. The general solution is $$N_p(m,k) \approx \frac{\ln C -\ln(1-p)}{\ln(\alpha)}.$$For convenience, I used $C = 1$ to check against your values. This gave approximations of $N_1 = 7, 76, 768, 7699, 77013$ and $N_2 = 23,251,2551,25574,255831$ when $m = 10$ and $2 \leq k \leq 6$.
Finally, observe that transitioning from $k$ to $k + 1$ resulted in the $N$'s being multiplied by $m$ (approximately). Again, the binomial approximation is suggestive whenever $\alpha \approx 1 + \frac{m-1}{m^k}$: $$\left(1 + \frac{m-1}{m^k}\right)^n \approx \left(1 + \frac{n(m-1)}{m^k}\right) = \left(1 + \frac{nm(m-1)}{m\cdot m^k}\right) \approx \left(1 + \frac{m-1}{m^{k+1}}\right)^{nm}$$