I read recently about an idea to, instead of storing actual data, converting the data to a string of digits and then store the index of where this pattern occurs in some number, for example $\pi$. The idea being that the index of the data would take up less storage space than the actual data.
Of course, we don't know whether $\pi$ is a normal number and hence we do not know if every finite decimal pattern occurs, but let's assume for the moment that it does (or one simply changes to some proven normal number, like the Copeland-Erdős constant).
The thing that struck me was whether the index of the data might actually be a larger number than the data itself. Does there exist some measure of the probability of finding a decimal sequence of length $n$ before the $m$:th decimal place? For $\pi$ in this case, I doubt there's a general formula. Would it depend on the base?
Information or references to other, similar ideas are also very welcome.
(Yes, I understand that this method is very impractical for everyday use, I just found the idea intriguinng.)
Taking decimal base, assuming (big assumption) that the digits of $\pi$ are random, (uniform distribution, iid) , then given a number with $k$ decimal digits, the probability of finding it before some time $n$ is difficult to find in general (see eg here), and it might depend on the number itself.
A simplifying assumption would be to assume that all coincidence tries are independent (no overlapping) ; obviously a false assumption, but in many asympotics this is a fair approximation). We'd have then a geometric random variable with $p=1-10^{-k}$ (probability of success), and its expected value would be $\approx 10^{k}$. Which is the same order of the value of the number. Hence -under this very coarse approximation- the "index" is on average of the same magnitude as the number itself.