Entropy of fair but correlated coin flips

721 Views Asked by At

Consider the joint distribution, $p(\xi_1,...\xi_N)$, with components defined as $\xi_i=\mathrm{sign}(x_i)$, with $(x_1,...,x_N)\sim\mathcal{N}(0,\Sigma)$ with $ \Sigma_{ij}=\delta_{ij}+(1-\delta_{ij})\tilde{\rho} $, i.e. all off-diagonal entries of $\Sigma$ are $\tilde{\rho}$. Since all components $x_i$ have unit variance (i.e. diagonal entries of $\Sigma$ are $1$), $\tilde{\rho}$ is then the correlation of any pair $(x_i,x_j)$. Note that

  1. all single component marginals $p(\xi_i)=1/2$, i.e. the coins are unbiased;
  2. we can always set the value of $\tilde{\rho}$ so that any pair $(\xi_i,\xi_j)$ has the desired correlation of $\rho$.

What is the formula for the entropy of $p(\xi_1,...\xi_N)$, denoted $H_N(\rho)$?

The parametrization of $\tilde{\rho}$ by $\rho$ depends on $N$ and is obtained by calculating the pair marginal probability $p(\xi_i=1,\xi_j=1)$, denoted $p_{11}$. Let $f_N(\tilde{\rho})$ be the function of $\tilde{\rho}$ that gives this probability. The formula for the correlation between two binary random variables, $$ \rho=\frac{p_{11}-p_{-1}p_1}{\sqrt{p_{-1}(1-p_{-1})p_1(1-p_1)}}={4}p_{11}-1\;, $$ with $p_{\pm1}$ denoting $p(\xi_i=\pm 1)=p(\xi_j=\pm 1)=1/2$, gives the desired parametrization, $$ \tilde{\rho}_N(\rho):=f^{-1}_N\left(\frac{\rho+1}{4}\right)\;.\;\;\;\;(\mathrm{eq}.1) $$

Solution for $N=2$:

Computing the 2D Gaussian integral using spherical coordinates gives $\tilde{\rho}_{N=2}(\rho)=\sin(\frac{\pi}{2}\rho)$. In this special case, $(\mathrm{eq}.1)$ and symmetry constraints completely specify the distribution $p(\xi_1,\xi_2)=(1+\xi_1\xi_2\rho)/4$. Calculation of the entropy from its definition gives

$$ H_{N=2}(\rho)=H(\xi_1,\xi_2)=\sum_{\eta=(1\pm\rho)/4}2\left(-\eta\log_2\eta\right). $$

We have $H_{N=2}(0)=2$ bits and $H_{N=2}(1)=1$ bit. In fact, we know $H_N(\rho=0)=N$ and $H_N(\rho=1)=1$ for all $N$.

For $N>2$:

There are lots of Gaussian integrals to do. Due to the symmetry in the index permutations there are only $N$ distinct integral values among the $2^N$ terms in the entropy. They can be grouped by how many 1s they contain. The pair of all $-1$s can be grouped with the singleton group of all $1$s due to the reflective symmetry in the plane normal to the main diagonal. We just need compute the multiplicity and the integral for each of the $N$ terms.

E.g. for $N=2$ there are 2 terms (listed above with multiplicity 2). For $N=3$ there are three terms (two for tuples containing one and two 1s), the third being the extreme group having $(-1,-1,-1)$ and $(1,1,1)$ in binary notation.

In section 3.2 of this paper, Six gives a recursive solution to the distribution. This could be used to compute the entropy and the parametrization function $\tilde{\rho}_N(\rho)$.

Accepted Answer: In the end, an alternative approach based on expressing $x_i=\sqrt{1-\tilde{\rho}}y_i+\sqrt{\tilde{\rho}}s$, with $y_i,s\sim\mathcal{N}(0,1)$, seems to be more straightforward and is the accepted answer below. That answer also indicates that the $f_N(\tilde{\rho})$ does NOT seem to depend on $N$ after all.

2

There are 2 best solutions below

4
On BEST ANSWER

This is not an additional answer but some summary from the results found in the references, particularly this one.

Let $${\bar {\bf z}} =a {\bar {\bf r}}+ b\, {\bf s} \, {\bar u} \tag 1$$

where $a,b$ are positive constants, ${\bar u}$ is a vector of $n$ ones, $s$ and ${\bar {\bf r}}$ are independent standard gaussian (scalar and $n-$multivariate resp). Then, ${\bar {\bf z}} \sim \mathcal N(0,\Sigma)$ with $\Sigma = a^2 I +b^2 U$. Hence, by setting $b=\sqrt{\tilde \rho}$ and $a = \sqrt{1-\tilde \rho}$, ${\bar {\bf z}}$ models our (positively) equicorrelated gaussian variables.

Let ${\bar {\bf x}}$, where ${ \bf x_i}={\mathbf 1}_{z_i\ge 0}$ be our correlated coins. And let ${ \bf y} =\sum_i { \bf x_i} \in 0,1\cdots n$ be their sum.

Notice that ${\bar {\bf z}}$ conditioned on ${\bf s}$ are iid, with $z_i | s \sim \mathcal N(bs, a^2)$

Also $x_i | s \sim \mathcal B_e(1 - \Phi(-sb/a))=B_e(\Phi(sb/a))$ and $y | s \sim \mathcal B(n,\Phi(sb/a))$ , where $\Phi$ is the standard cumulative gaussian function, and $\mathcal B_e$, $\mathcal B$ denotes the Bernoulli and Binomial distributions.

Now, noting that $H({ \bf y} | {\bar {\bf x}})=0$, we get

$$ \begin{align} H({\bar {\bf x}})&=H({ \bf y})+H({\bar {\bf x}}|{ \bf y})\\ &=H({ \bf y})+ \sum_y P({ \bf y}=y) H({\bar {\bf x}}|{ \bf y}=y) \\ &=H({ \bf y}) + \sum_y P({ \bf y}=y) \log \binom{n}{y}\\ &= \sum_y P({ \bf y}=y) \log \frac{\binom{n}{y}}{P({ \bf y}=y)} \tag 2\\ &= n-D( p({\bf y}) || {\mathcal B}(n,1/2)) \end{align}$$

And we can compute $P({ \bf y}=y)$ by integrating:

$$ p({\bf y}) = \int_{-\infty}^{\infty} P(y|s) P(s) ds = \int_{-\infty}^{\infty} \mathcal B(n,\Phi(sb/a)) \, \phi(s) ds \tag 3$$

This already allows us to compute numerically the entropy. For large $n$, we can instead write

$$ H({\bar {\bf x}}) = H({\bar {\bf x}}|s) + I({\bar {\bf x}}; s) \tag 4$$

where $I()$ is the mutual information (some care is needed here because ${\bar {\bf x}}$ is discrete and $s$ is continuous, but the equation is justified nevertheless). The first term is linear in $n$ (conditioning on $s$ makes the components $x_i$ iid) and the second term is $O(\log n)$, because

$$ I({\bar {\bf x}}; s) \le I({\bar {\bf z}}; s) = h(z)-h(z|s)= \frac12 (\log |\Sigma| - \log |a^2 I|) = \frac12 \log(1+b^2n) \tag 5$$

Then $$H({\bar {\bf x}}) = n \int \phi(s) h_b \left(\Phi(s \,b/a)\right) ds+ O(\log n) \tag 6$$

where $h_b(p)=-p \log p -(1-p)\log(1-p)$ is the binary entropy function.

The same approach can be used to map $\tilde \rho \to \rho$:

$$\begin{align} \rho &= 4 P(x_1=1,x_2=1)-1 \\ &= 4 \int P(x_1=1,x_2=1|s) p(s)ds -1 \\ &= 4 \int \Phi^2(sb/a) \phi(s) ds -1 \\ &= 4 \int \Phi^2\left(s \sqrt{\frac{\tilde \rho}{1-\tilde \rho}}\right) \phi(s)\, ds -1 \\ &= \frac{2}{\pi} \sin^{-1}({\tilde \rho}) \end{align} $$

5
On

Of course, the assumption $H(\xi_i|\xi_{i+1},...,\xi_N)=H(\xi_i|\xi_{i+1})$ seems wrong, even as an approximation. I doubt that there is a simple solution.

Here's at least some numerical results, from simulations. The horizontal axis is $\rho$, the vertical axis is the estimated entropy, normalized to $[0,1]$: $H'=(H-1)/(N-1) $

enter image description here

As you mention, the evaluation of the entropy amounts to integrate a multimensional gaussian (albeit zero mean and symmetric wrt the components) over all the quadrants. The answers to this question have some pointers. Eg this one

The most specific reference I found is this one .

Another interesting one: https://arxiv.org/abs/1009.2855


Added: from the same simulations, here are the conditional probabilities $H(\xi_i|\xi_{i-1},...,\xi_1)$ for diferent values of $\rho$. I wonder if there is non-zero limit for $N\to \infty$...

enter image description here

In case someone is interested, here's the Python code

import numpy as np
import math

def hb(p):
    return -p * math.log2(p) - (1 - p) * math.log2(1 - p) if p > 0 and p < 1 else 0

def calca(nn, r):
    return (math.sqrt((1 - nn) * r**2 + (nn - 2) * r + 1) - r + 1) / r

def calcrho(nn, aa):
    return (nn + 2 * aa) / ((aa + 1) ** 2 + nn - 1)

def gen1(nn, aa):
    y = np.random.normal(0, 1, nn)
    return (np.sum(y) * np.ones(nn) + aa * y) / math.sqrt((aa + 1) ** 2 + (nn - 1))

def tryn(nn, rho, t):
    aa = calca(nn, rho)
    Pk = np.zeros((nn + 1, nn + 1))
    PJk = np.zeros((nn + 1, nn + 1))  # joint prob
    PCk = np.zeros((nn + 1, nn + 1))  # cond prob
    for _ in range(t):
        y = gen1(nn, aa)
        z = np.array([1 if t > 0 else 0 for t in y])
        c = 0
        for k in range(nn):
            c += z[k - 1] if k > 0 else 0
            Pk[k][c] += 1
            if z[k] == 1:
                PJk[k][c] += 1
    # normalize probabilities
    for k in range(nn):
        Pk[k] /= t
        PJk[k] /= t
    for k in range(nn):
        for c in range(k + 1):
            PCk[k][c] = PJk[k][c] / Pk[k][c] if Pk[k][c] > 0 else 0
    r = calcrho(nn, aa)
    print(f"{nn=} {r=} {aa=} {t=}")
    # print(np.round(Pk[:5,:5],5))
    # print(np.round(PJk[:5,:5],5))
    # print(np.round(PCk,5))
    H = np.zeros(nn)
    for k in range(nn):
        H[k] = 0
        for c in range(k + 1):
             H[k] += hb(PCk[k][c]) * Pk[k][c]
    print('h', np.round(H, 5))

rho = 0.7
nn = 12
tryn(nn, rho, 800000)