Is there a measure on $\mathbb{Z}$ giving integers independent binary digits?

81 Views Asked by At

An integer $x\in\mathbb{Z}$ can be identified with its binary expansion, $x = \ldots b_3 b_2 b_1b_0$ where $b_i \in \{0,1\}$ for each $i\geq 0$ and $b_i = 0$ for all sufficiently large $i$. Fixing a probability measure on $\mathbb{Z}$, if we select $x$ randomly according to this measure, then we have a random selection of bits given by the binary expansion of $x$. My question is: which measures on $\mathbb{Z}$ make the bits $b_i$ independent?

I think that it may be easier to answer this question if we think about it in the reverse direction: we can start with a stochastic process $(b_i)_i$, then $x = \sum_i 2^i b_i$ is an integer-valued random variable which thus defines a measure on $\mathbb{Z}$. If $(b_i)_i$ is an independent process, then (I believe) it is determined by the probabilities $p_i = P(b_i = 1)$.

A natural choice would be $p_i = 1/2$ for every $i$, but the problem with that is then $b_i = 1$ for infinitely many $i$ with probability $1$, meaning $x$ is almost surely infinite. Then I have two associated questions:

  1. What conditions on $(p_i)_i$ make $x$ almost surely finite? (I think $\lim_i p_i = 0$ is sufficient, but is it necessary?)

  2. Subject to these conditions, what measure does $x$ impose on $\mathbb{Z}$? What is its pdf, $f(n) = P(x = n)$?


Edit: Thanks all for your input. The first of my questions has been settled: $x = \sum_i 2^i b_i$ is almost surely finite iff $\sum_i p_i < \infty$. I would like now to characterize the pdf $f(n) = P(x = n)$, analytically if possible (though I think this is unlikely). Is there, for instance, a "natural" choice of $(p_i)_i$ making $f(n)$ especially simple to describe/well behaved (e.g., positive and decreasing?)

To start off, let's assume $0 < p_i < 1$ for each $i$, since it wouldn't make sense for any bit to be almost surely constant. Then if $s = \sum_i p_i$ it holds that $f(0) = \prod_i(1-p_i) < e^{-s}$ (since $\ln(1-x) < -x$). Then using the independence of bits, we can compute the following values:

  • $f(1) = \frac{p_0}{1-p_0}f(0)$
  • $f(2) = \frac{p_1}{1-p_1}f(0)$
  • $f(3) = \frac{p_1}{1-p_1}\frac{p_0}{1-p_0}f(0)$
  • $f(4) = \frac{p_2}{1-p_2}f(0)$

A pattern emerges:

$$ f(n) = f(0) \prod_j \frac{p_j}{1-p_j}$$

where the product is taken over the indices $j$ with $b_j = 1$ in the binary expansion of $n$.

If $f(n)$ is positive and decreasing, then one observes two necessary conditions: $\frac{p_i}{1-p_i} < 1$ (equiv.: $p_i < 1/2$) for each $i$, and

$$\frac{p_0}{1-p_0}\frac{p_1}{1-p_1}\cdots \frac{p_{n-1}}{1-p_{n-1}} > \frac{p_n}{1-p_n}$$

I now have a precise question:

  1. Is there a natural or obvious example (or any example) of a sequence $(p_i)_i$ of probabilities satisfying these four criteria?:

(i) $0 < p_i < 1/2$ for every $i$

(ii) $\sum_i p_i < \infty$

(iii) $\prod_i (1-p_i) > 0$

(iv) $\prod_{j=0}^{n-1} \frac{p_j}{1-p_j} > \frac{p_n}{1-p_n}$

3

There are 3 best solutions below

0
On BEST ANSWER

There is a natural and elementary example: the geometric distribution.

Let $a \in (0,1)$ be fixed and choose $p_i = \frac{a^{2^i}}{1+a^{2^i}}$. Then $\frac{p_i}{1-p_i} = a^{2^i}$, in which case we have the beautiful coincidence:

$$f(n) = f(0)\prod_j a^{2^j} = f(0)a^{\sum_j 2^j} = f(0)a^n$$

since $j$ ranges over precisely the indices where $b_j = 1$ in the binary expansion of $n$. And since $\sum_n f(n) = 1$ we must have $f(0) = 1-a$, therefore $f(n) = (1-a)a^n$ for each $n$.

The four properties for $(p_i)_i$ are also easily checked, with property (iv) simplifying to the identity $1 + 2 + 2^2 + 2^3 + \cdots + 2^{n-1} <2^n$.

I have never seen the geometric distribution motivated/discovered by way of asserting that the digits of a random integer must be independent. This is of course already known (mentioned under general properties of the distribution on Wikipedia) and astonishingly it works not only in base $2$ but in any base. Truly remarkable.

1
On

The probability "$\sum_{i=0}^\infty 2^i b_i$ is finite" equals the probability $P_f$ that the $(b_i)_i$ sequence has finite support:

$$ P(\forall i \geq k : b_i = 0) = \prod_{i=k}^\infty (1-p_i) $$ is a non-decreasing function of $k$, bounded above by $1$. $$ P_f = P(\exists k : \forall i \geq k : b_i = 0) = \lim_{k \to \infty} \prod_{i=k}^\infty (1-p_i) $$

I think $\lim_i p_i = 0$ is sufficient, but is it necessary?

It is necessary: Suppose $P_f = 1$, and let $\epsilon$ be any positive number. Define the set of indices $I = \{i \mid p_i \geq \epsilon\}$. If $I$ is infinite, then for every $k \geq 0$ there is a $j \geq k$ such that $p_j \geq \epsilon$. This implies $\prod_{i=k}^\infty (1-p_i) \leq (1-p_j) \leq 1-\epsilon$, and therefore $P_f \leq 1-\epsilon$; contradiction. So $I$ is finite, and there is an index $k$ larger than every element of $I$ (vacuously true if $I$ is empty). This means $i \geq k$ implies $p_i < \epsilon$. Therefore $\lim_{i \to \infty} p_i = 0$.

It is not sufficient. Let $p_i = 1-e^{-1/i}$. ($i$ is still the index; everything here is real.) It's true $\lim_{i \to \infty} p_i = 0$. But

$$ \prod_{i=k}^n (1-p_i) = \prod_{i=k}^n e^{-1/i} = \exp\left(-\sum_{i=k}^n \frac 1 i \right) $$

As $n$ goes to infinity, the sum $\sum_{i=k}^n \frac 1 i$ approaches $+\infty$, so the $\exp$ value and the product approach zero. That is, $\prod_{i=k}^\infty (1-p_i) = 0$ for every $k$. Then $P_f = 0$ also.

This counterexample hints at a condition which is necessary and sufficient: If the infinite series $\sum_{i=0}^\infty \ln(1-p_i)$ converges, then $P_f = 1$. If the series diverges, then $P_f = 0$. And $0 < P_f < 1$ is not possible at all. If the reasoning here or proof of this isn't clear, comment and I may add more later.

0
On
  • The Borel-Cantelli lemma tells you that if $\sum\limits_{i=0}^\infty p_i$ (i.e. the expected number of $1$s) is finite then the numbers of $1$s and so $X$ will be almost surely finite. So this is a sufficient condition.

  • Since each digit is independent of the others, the converse or second Borel–Cantelli lemma tells you that that if $\sum\limits_{i=0}^\infty p_i$ is infinite then the number of $1$s in $X$ will be almost surely infinite.

This means that $\sum\limits_{i=0}^\infty p_i$ being finite is a necessary and sufficient condition.

It also means that $\lim\limits_{i\to \infty} p_i = 0$ is a necessary but not sufficient condition.

This still gives you a wide possibility of possibilities of distributions and probability mass functions. You just need to choose a suitable sequence for $p_i$ meeting the condition. For example, one such possibility is $p_i=\frac{1}{(i+2)^2}$; another is $p_i=\frac{1}{2^{i+1}}$, and there are many more.

Assuming all the $p_i < 1$, you can say $f(n) = \mathbb P(X=n) = \dfrac{p_{\lfloor \log_2(n) \rfloor}^{\,}}{1-p_{\lfloor \log_2(n) \rfloor}^{\,}}f\left(n-2^{\lfloor \log_2(n) \rfloor}\right)$ starting with $f(0)=\prod\limits_{i=0}^\infty(1-p_i)$. This may help in generating the start of the probability mass functions, and the two possibilities mentioned above would give the following probabilities for the two examples above:

 n    1st example     2nd example          
 0    0.5             0.2887880951
 1    0.1666666667    0.2887880951
 2    0.0625          0.0962626984
 3    0.0208333333    0.0962626984
 4    0.0333333333    0.0412554422
 5    0.0111111111    0.0412554422
 6    0.0041666667    0.0137518141
 7    0.0013888889    0.0137518141
 8    0.0208333333    0.0192525397
 9    0.0069444444    0.0192525397
10    0.0026041667    0.0064175132
11    0.0008680556    0.0064175132
12    0.0013888889    0.0027503628
13    0.0004629630    0.0027503628
14    0.0001736111    0.0009167876
15    0.0000578704    0.0009167876
16    0.0142857143    0.0093157450

Plotting from $f(0)$ through to $f(64)$, these would look like this (with the first example using blue circles and the second using red pluses):

pmf

but too many values are close to $0$ the see the pattern, so using a log scale

pmf log scale

and you might get broadly the same kind of patterns with other choices for the $p_i$, at least if they are all positive but less than $\frac12$.