Finding $1$, $2$, $3$, $\ldots$, $n$ in a random string of $x$ digits

178 Views Asked by At

A weird but an interesting question I thought of ...


Consider this random 100-digit integer that I generated using random.org

$$9771602964370316251552537368279107346523948777589513994616004391156991741564023185294939440725424639$$

This number will most likely have $1,2,3..$ upto $9$ in them. But when we further look into we can also find $10$, (i.e $1$ then $0$ right after) and also $11$. But there's no $12$ in them (i.e no $1$ then a $2$ right after).

So what I want to find out is what will be the approximation of $n$, such that there will be $1,2,3,...,(n-2),(n-1),n$ integers that can be seen inside a random x-digit. Also note that $(n+1)$ can never be found in a random x-digit.

If you say that the 'n' value will change randomly and there cannot be any asymptotic or approximations of that. You're kind of wrong. See, I did this with 9 other random 100-digit numbers and also the 100 decimal places of the famous constants $\pi,e$ and $\phi$. The results are pretty close to $11$. Here are them; (scroll right to see the corresponding $n-values$)

$$\begin{array} {|r|r|}\hline Sl. No. & 100-Digit Integer & n-value \\ \hline 1 & 9771602964370316251552537368279107346523948777589513994616004391156991741564023185294939440725424639 & 11 \\ \hline 2 & 5933798531736924945959336111487355483181228762319970946972332473586986158448822614193445434946714049 & 9 \\ \hline 3 & 0748006010018385701305255708540275105163777493821503992289412752198434315022707697359036044788900096 & 10 \\ \hline 4 & 8895177092192596874320826762636449513223106780359367744719402357085023730983517329137267473940940810 & 10 \\ \hline 5 & 4921918080733246056200145588743524919616898465382373461652822273753544023393089458416653653351525383 & 9 \\ \hline 6 & 6386614054105909456194325500596228770058096441918965347422338768650600045406879715848359336828536144 & 10 \\ \hline 7 & 8599263737601951772329677396723688342052445287633091648167742178784189581850841772769226465303321562 & 9 \\ \hline 8 & 0660623044515625611084768937485195387862394776640249639087054616789402273433078233077669093361145164 & 11 \\ \hline 9 & 8747230800762985911116404157733707704590034002122077023051291424897238915375434573976156208269944527 & 9 \\ \hline 10 & 1177720655605689615530670722522895831819411456667323162088720568188745848931083487687853204730731860 & 11 \\ \hline π & 1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679 & 11 \\ \hline e & 7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274 & 9 \\ \hline φ & 6180339887498948482045868343656381177203091798057628621354486227052604628189024497072072041893911374 & 9 \\ \hline \end{array}$$

Now take the average of all the n-values and you'll get

$n ≈ 9.8$, for $x=100$

I'm starting to work on $x=1000$ and for a hint $n ≈ 99$ or $100$ and this is just like approximate value of an approximate value, I'll find a good approximate value soon. So what are your thoughts on this? What will be a good expression that could take in the $x$ value and give a good approximate?

I'm got a formula, a crude, anecdotal that could give a value close to what is got. I'll post it soon after a few modifications.


EDIT

Some observations that I did.

$$n ≈ \frac{x}{10}\tag{1}\label{1}$$

Or I can say that for a $x$, the $n$ value is; (Big thanks to @glowstonetrees for an inspiration.) $$n≈9*10^{m−3}\tag{2}\label{2}$$ Where $m$ is the no. of digits of $x$. But we know $x$ is the no. of digits of the given $x$-digit value. So $m$ is like no. of digits of the no. of digits of $x$-digit integer

Now for my own anecdotal estimate of this;

$$n≈\pi^{2\left(\log\left(x\right)-1\right)}\tag{3}\label{3}$$

The reason I chose log is that it is a "slow-moving" function so it doesn't change rapidly. And don't ask me why $\pi$ is there, I don't know I kinda like it. But I also edited it and made it more efficient; so it is;

$$n≈\pi^{2\left(\log\left(x\right)-\sin\left(79.4\right)\right)}\tag{4}\label{4}$$

The reason I chose $\sin(79.4)$ is that $$\sin(79.4)=0.9829353491$$ So I didn't want an exact $1$ as in Eq.3 but a rather a number close to 0.98. And I believe this is an irrational number and $sin(79.8)$ is just an estimate for this.

**

Now to test all these expressions with $x=10000$, from data I got that $n≈999$

$$n≈\frac{10000}{10}=1000\tag{By Eq.1}\label{5}$$ $$n≈9*10^{5−3}\tag{By Eq.2}\label{6}=900$$ $$n≈\pi^{2\left(\log\left(10000\right)-1\right)}≈961.38\tag{By Eq.3}\label{7}$$ $$n≈\pi^{2\left(\log\left(10000\right)-\sin\left(79.4\right)\right)}≈999.6\tag{By Eq.4}\label{8}$$

So yeah even know my equation, i.e Eq 4 did well; I have to say Eq 1 of $n≈\frac{x}{10}$ is really the best bet for simplicity and accuracy.

2

There are 2 best solutions below

1
On

These are just some thoughts, not a complete answer. But since the comment section is getting kinda long, I decided to post this here.

There are $9 \cdot 10^{x-1}$ $x$-digit integers.

The integers with $n$-value $=0$ are those that do not contain the digit $1$. There are $8\cdot 9^{x-1}$ such integers.

The integers with $n$-value $=1$ are those that have the digit $1$, but do not contain a single $2$. There are $8 \cdot 9^{x-1} - 7 \cdot 8^{x-1}$ such integers (those with no $2$'s minus those with no $2$'s and no $1$'s).

Similarly, the integers with $n$-value $=2$ are those that have $1$ and $2$ but no $3$. There are $8 \cdot 9^{x-1} - 2 \cdot 7 \cdot 8^{x-1} + 6 \cdot 7^{x-1}$.

And so on...

\begin{array} {|r|r|}\hline n\text{-value} & \text{Number of $x$-digit integers with this $n$-value} \\ \hline 0 & 8\cdot 9^{x-1} \\ \hline 1 & 8\cdot 9^{x-1} - 7 \cdot 8^{x-1} \\ \hline 2 & 8\cdot 9^{x-1} - 2\cdot 7 \cdot 8^{x-1} + 6 \cdot 7^{x-1} \\ \hline 3 & 8\cdot 9^{x-1} - 3\cdot 7 \cdot 8^{x-1} + 3\cdot 6 \cdot 7^{x-1} - 5 \cdot 6^{x-1} \\ \hline 4 & \sum_{k=0}^4 C^4_k\cdot (8-k) \cdot (9-k)^{x-1} \\ \hline 5 & \sum_{k=0}^5 C^5_k\cdot (8-k) \cdot (9-k)^{x-1} \\ \hline 6 & \sum_{k=0}^6 C^6_k\cdot (8-k) \cdot (9-k)^{x-1} \\ \hline 7 & \sum_{k=0}^7 C^7_k\cdot (8-k) \cdot (9-k)^{x-1} \\ \hline 8 & \sum_{k=0}^8 C^8_k\cdot (8-k) \cdot (9-k)^{x-1} \\ \hline \end{array}

For convenience, I shall call the numbers on the right column of the table $n_0, n_1, n_2, \dots, n_8$.

The remaining $9\cdot 10^{x-1} - (n_0 + \cdots + n_8)$ integers have all the digits $1$ through $9$, so their $n$-values are $\geq 9$.

It follows that a lower bound for $n$ is

$$\frac{0n_0 + 1n_1 + 2n_2 + \cdots + 8n_8 + 9\big[9\cdot 10^{x-1} - (n_0 + \cdots + n_8)\big]}{9\cdot 10^{x-1}}$$

Note that this is a pretty tight lower bound, since $\frac{n_0}{9\cdot 10^{x-1}} = 0.8$ so that $80\%$ of the integers have already been accounted for in the first item.

6
On

Given a random sequence of $x$ digits in base $b$, the probability that a given $m$-digit string does not appear is $$S(m) := \left(1 - \frac{1}{b^m}\right)^x \approx \exp\left(-\frac{x}{b^m}\right),$$ and the approximation is good even for modest $m$. Whether a given $m$-digit string appears is close to independent from whether another given $m$-digit string appears, so the probability that all $(b - 1) \cdot b^{m - 1}$ strings of length $2$ (not starting with $0$) appear is approximately $$P(m) := \left[1 - \exp\left(-\frac{x}{b^m}\right) \right]^{(b - 1) \cdot b^{m - 1}} \approx \exp \left[-(b - 1) \cdot b^{m - 1} \exp \left(-\frac{x}{b^m}\right)\right] .$$

For small $m$ this probability is close to $1$ and for large $m$ this probability is close to $0$. To find the $m$ values at which the critical behavior occurs---that is, for which the probability is not extremely close to $0$ or $1$, we find where the probability is about $e^{-1}$. Solving, we find that $$m \approx \log_b x - \log_b W \left( x \left(1 - \frac{1}{b}\right) \right) = \log_b x + O(\log \log x), $$ where $W$ is the Lambert $W$-function. Informally (when this quantity is nonintegral) we expect most of the interest behavior to occur when $m$ is the integer $a$ preceding $m$ and the integer $a + 1$ succeeding $m$, and so expect the expected value $m$ will be within an order of magnitude of this quantity.

The probability of exactly $k$ strings of length $a$ being missing is about $P (1 - P)^k$, where $P := P(a)$, and the expected value of the minimum of $k$ positive integers with $a$ or fewer digits is $\frac{1}{k + 1} \cdot (b^a - 1)$, so the expected value of the first missing number, contingent on that that number having $a$ or fewer digits is $$(b^a - 1)\sum_{k = 0}^\infty \frac{P (1 - P)^k}{k + 1} = (b^a - 1) \frac{P \log P}{P - 1} = (b^a - 1) \log P + O\left(\frac{\log P}{P}\right) .$$

Now, consider the case in which no strings of length $a$ are missing. The probability that any given string of length $a + 1$ occurs is about $1 - S$, $S := S(a + 1)$, so the probability that exactly the first $r$ strings of length $a + 1$ all occur is $S (1 - S)^r$, and so (unless $S$ is really close to $1$) the expected value of the number of strings $10\ldots00, 10\ldots01, \ldots$ that occurs is about $$\sum_{k = 0}^{\infty} k S (1 - S)^k = \frac{1}{S} - 1.$$

This gives for the expected value of $n$ the approximation $$\Bbb E[n] \approx (b^a - 1) \frac{P \log P}{P - 1} + \frac{1}{S} - 1 = (b^a - 1) \log P + \frac{1}{S} + O(1) .$$

For the example in the comments, where $x = 10^5$ and $b = 10$, we find $a = 4$, then $P = 0.664\!\ldots$ (so all strings of length $4$ in a string of length $10^5$ about $2 / 3$ of the time), $S = e^{-1}$, and an expected value of $$8.1 \cdot 10^3 .$$ A Monte Carlo simulation with $2 \cdot 10^5$ trials gives an average value of $\approx 8383$, so the relative error at least in this case is modest.

For $b = 10$, the estimates using this method for the expected value of $n$ are (to $2$ significant figures)

$$\begin{array}{rrr} x & \Bbb E[n] \\ \hline 10^{2\phantom{.5}} & 11 \\ 10^{2.5} & 32 \\ 10^{3\phantom{.5}} & 100 \\ 10^{3.5} & 120 \\ 10^{4\phantom{.5}} & 980 \\ 10^{4.5} & 1020 \\ 10^{5\phantom{.5}} & 8100 \\ 10^{5.5} & 10000 \\ 10^{6\phantom{.5}} & 32000 \\ \end{array}$$

Remark If we take $x$ to be some constant multiple $\lambda \cdot b^m$ of $b^m$, this approximation simplifies some to $$\exp\left(-(b - 1) \cdot b^{m - 1} e^{-\lambda}\right) ,$$ but this quantity approaches $0$ as $m \to \infty$, so for large $m$ the probability that all strings of length $m$ occur among $\lambda \cdot b^m$ digits approaches $0$. So, as $x \to \infty$ the expected number $n$ of numbers that occur in a string of length $x$ grows more slowly than any constant multiple of $x$, and in particular (for $b = 10$) the approximation $\Bbb E[n] \approx \frac{x}{10}$ cannot be valid for large $m$. We can already see this in the table above, where for $x = 10^6$ we have $\Bbb E[n] \approx \frac{x}{100}$.