Why is the distribution of values of this random matrix product (seemingly) independent of dimension?

46 Views Asked by At

I'm investigating the behavior of the value $|x A y|$ where $x, y \in \mathbb{C}^N$ have unit 2-norm and are uniformly sampled from the unit ball, and $A \in \mathbb{C}^{N \times N}$ has elements sampled from the complex standard normal distribution.

When I plot an empirical histogram of the magnitude of this product for varying dimension $N$, it seems that the distribution of values is independent of $N$. For example, in Julia:

julia> using Random, Plots, LinearAlgebra
julia> CF = Complex{Float32}
julia> histogram(norm.([normalize(randn(CF, n))' * randn(CF, (n, n)) * normalize(randn(CF, n)) for _ in 1:2000, n in [1, 8, 64, 512]]), layout = (4, 1))

gives this result: 1

Is there a proof or at least an intuition on why this would be the case? I tried reading a bit on random matrix theory but it was a bit too advanced for me and I'm guessing what's going on here isn't that complicated.

2

There are 2 best solutions below

0
On

Double-check my arithmetic, but here's the basic reason for what you observed.

Write$$x=a+ib,\,y=c+id,\,A=U+iV$$with $a,\,b,\,c,\,d\in\Bbb R^N,\,U,\,V\in\Bbb R^{N\times N}$ so $a^2+b^2=c^2+d^2=1$, with vectors in $\Bbb R^N$ squared in the sense $z^2:=z\cdot z$. It suffices to show that if we fix $a,\,b,\,c,\,d$ the conditional distribution of$$|x^\dagger Ay|^2=[\color{red}{a^TUc+b^TUd-a^TVd+b^TVc}]^2+[\color{blue}{a^TUd-b^TUc+a^TVc+b^TVd}]^2$$is $N$-independent. With a value of $\sigma^2$ that depends on how you normalize complex Gaussians (I'll leave you to work it out), $U,\,V\sim\mathcal{N}(0,\,\sigma^2I_N)$ are independent. The red vector is $N(0,\,R\sigma^2)$-distributed, with$$R:=\operatorname{tr}[(ac^T+bd^T)(ca^T+db^T)+(bc^T-ad^T)(cb^T-da^T)].$$You can treat the blue vector similarly. So $|x^\dagger Ay|$ is an $N$-independent multiple of a $\chi_2$ distribution, which peaks at $1$, roughly in line with your empirical results (although the fact that $a,\,b,\,c,\,d$ are stochastic may cause the peak to drift somewhat). (It's also a special case of the Rademacher distribution.)

0
On

I think I've got it.

The value $x A y$ is equal to $\sum_{i,j \in [N]} x_i A_{i,j} y_j$. The elements of $A$ are random values taken from the complex standard normal distribution, so it's already obvious that $x A y$ is a zero-mean symmetric complex normal distribution.

We can consider the individual elements of $x$ and $y$ as random variables as well. The constraint that $\|x\| = \|y\| = 1$ means that the individual elements of $x$ and $y$ considered as random variables have expectations of their magnitudes $\mathbb{E}[|x_i|] = \mathbb{E}[|y_i|] = \sqrt{\frac{1}{N}}$, and $x_i, y_i$ being independent means $\mathbb{E}[|x_i y_i|] = \frac{1}{N}$.

So,

$x A y \sim \sum_{i,j \in [N]} x_i y_j \mathcal{CN}(0, 1) \sim \sum_{i \in [N^2]}\frac{1}{N} \mathcal{CN}(0, 1) = \sum_{i \in [N^2]} \mathcal{CN}(0, \frac{1}{N^2}) = \mathcal{CN}(0, 1)$

Which means $|xAy|$ is the magnitude of a complex standard normal distribution, which is distributed according to the Rayleigh distribution, and which is independent of $N$.