What can we say about the distribution of bits in a binary representation of Gaussian variables?

47 Views Asked by At

Over on Computer Science, we got a question on compressing the concatenation of binary representations of highly correlated Gaussian variables, specifically several hundreds of realizations of $\mathcal{N}(\mu, \Sigma)$ with $\mu = (0,0)$ and $\Sigma = \left(\begin{matrix}1 & 0.9 \\ 0.9 & 1\end{matrix}\right)$. My gut feeling is that even though $X_i$ and $Y_i$ are strongly correlated, their binary representations need not be. For instance, $255 \approx 256$ but 011111111 and 100000000 are as different as can be. I would like to solidify or refute this intuition.

Say we are using a finite-precision encoding $\operatorname{enc}$ such as IEE floats. The question is: for i.i.d. $(X_1, Y_1), \dots, (X_N, Y_N) \sim \mathcal{N}(\mu, \Sigma)$ and

$\qquad\displaystyle Z = Z_1 \cdots Z_M = \operatorname{enc}(X_1) \cdot \operatorname{enc}(Y_1) \cdot \dots \cdot \operatorname{enc}(X_N) \cdot \operatorname{enc}(X_Y) \in \{0,1\}^M$

with $\cdot$ the string concatenation, what can we say about the distribution(s) of the $Z_i$? Specifically, how strongly are they correlated? Alternatively, can we derive bounds on the entropy of $Z$?

Since it seems clear that the $\operatorname{enc}(\__i)$ for different $i$ are uncorrelated, we can restrict ourselves to investigating how strongly (the bits of) $\operatorname{enc}(X_i)$ and $\operatorname{enc}(Y_i)$ are correlated.

If the weirdness of IEE floats makes things unnecessarily complicated, I'll be more than happy about answers that investigate comparable situations with integer-valued distributions.