Value of $E_{A,y}[\cos(A^T y,A^{-1} y)]$ for Gaussian $A,x,y=Ax$

169 Views Asked by At

For $d\times d$ matrix $A$, $x\in \mathbb{R}^d$ with IID standard normal entries and $y=Ax$, I'm interested in the the following quantity where $\cos(u,v)$ refers to cosine similarity:

$$E_{A,x}[\cos(A^T y,A^{-1} y)]$$

It appears to be $\frac{1}{\sqrt{2}}$, in simulations, can this be proven?

Motivation: This justifies the use of transpose+line search to approximately solve linear equations

randn[dims__] := RandomVariate[NormalDistribution[], {dims}];
mat = randn[1000, 1000];
{d2, d1} = Dimensions[mat];
pinv[mat_] := Inverse[mat];

bs = 10000;
vecsIn = randn[bs, d1];
vecsOut = (mat . vecsIn\[Transpose])\[Transpose];
meanAlign[vec1_, vec2_] := 
  Median@MapThread[(#1 . #2/(Norm[#1] Norm[#2])) &, {vec1, vec2}];
Print["average cosine similarity: ", 
 meanAlign[(mat\[Transpose] . vecsOut\[Transpose])\[Transpose], 
  vecsIn]]
2

There are 2 best solutions below

2
On

Below is proof outline. Several places switch the order of expectation $f(E(x))=f(E(x))$ which was confirmed to hold in numeric simulations, but I'm curious how to show it more rigorously:

Given $x\sim \mathcal{N}(0, 1)$, define $y=Ax$ for invertible $d\times d$ matrix $A$ and consider 2 solutions of this system:

  1. Exact solution $x=A^{-1}y$
  2. Approximate solution $\hat{x}=A^T y$

We seek to show that angle between two solutions $\cos \angle$ converges in probability to $\frac{1}{\sqrt{2}}$ where $$\cos \angle=\frac{\langle x, \hat{x}\rangle}{\|x\|\|\hat{x}\|}$$

and $A$ is a large matrix with IID Gaussian entries.

Proof parts:

  1. Define "effective rank" of a matrix $A$ as $R(A)=\|A\|_F^4/\|AA^T\|_F^2$ and show convergence in probability w.r.t. random $x$

\begin{equation} \cos \angle \xrightarrow{P} \sqrt{\frac{R(A)}{d}} \tag{0} \label{0} \end{equation}

  1. Show that $R(A)\xrightarrow{P} d/2$ for large Gaussian matrix $A$

  2. Combine these two to get convergence in probability with respect to both random $x$ and $A$

Part 1

We are interested in $E_x cos\angle$, and for that we distribute "expectation into the expression". Specifically, must show that

\begin{equation} E_x \cos\angle = E_x \frac{\langle x, \hat{x}\rangle}{\|x\|\|\hat{x}\|}\xrightarrow{P} \frac{E_x \langle x, \hat{x}\rangle}{\sqrt{E_x\|x\|^2}\sqrt{E_x \|\hat{x}\|^2}} \tag{1}\label{1} \end{equation}

The last equality appears to hold in simulations, why?

Earlier discussion on mathoverflow gave this hint:

Now we can analyze three expectations separately:

  1. $E_x \langle x, \hat{x}\rangle = \|A\|^2_F$
  2. $E_x \|x\|^2=d$
  3. $E_x \|\hat{x}\|^2=\|AA^T\|_F^2$

Substituting 1,2,3 into $\eqref{1}$ and applying definition of $R(A)$ gets intended result

Proofs of 1,2,3 use trace cyclical property + linearity of trace/expectation which let us switch their order: $E_x\|x\|^2=E \operatorname{Tr} x^T x=E \operatorname{Tr} x x^T =\operatorname{Tr} E x x^T =\operatorname{Tr} I=d$

Part 2

When $A$ is random matrix with IID standard normal entries we have

  1. $E_A\|A\|^2_F=d^2$
  2. $E\|AA^T\|_F^2=d^2+2 d^3$ (proof technique by Sangchul Lee here, table of related formulas here

If we can somehow use concentration to change the order of expectation and division, we would get the following

$$R(A)=\frac{\|A\|_F^4}{\|AA^T\|_F^2}\xrightarrow{P} \frac{(E\|A\|_F^2)^2}{E\|AA^T\|_F^2}=\frac{d^4}{d^2+2d^3}$$

Plugging this into $\eqref{0}$ we get

$$E_x[\cos \angle]\xrightarrow{P} \frac{d^4}{d^3+2d^4}= \sqrt{\frac{1}{2}}$$

1
On

You're right that this converges to $1/\sqrt{2}$ as $d \rightarrow \infty$. In fact, we can explicitly find the distribution of your quantity of interest even for finite $d$. Namely, if $X = \cos(A^Ty, A^{-1}y)$, then $$ X^2 \sim \operatorname{Beta}\Bigl(\frac{d}{2}, \frac{d-1}{2}\Bigr), \quad \mathbf{E} X = \frac{\Gamma((d+1)/2)}{\Gamma(d/2)} \frac{\Gamma(d - 1/2)}{\Gamma(d)} \rightarrow \frac{1}{\sqrt{2}}. $$

To show this, write $$ X = \frac{x^T (A^T A) x}{\lVert A^T A x \rVert \lVert x \rVert}, $$ where $A^T A$ is a white Wishart matrix with dimention parameters $d, d$. Now, the distribution of $A^T A$ is rotation invariant, so if we take $Q$ to bea an orthogonal matrix whose first column is $x / \lVert x \rVert$, then $W := Q^T A^T A Q$ is still Wishart and is still independent of $x$ (despite $Q$ depending on $x$).

But then we have that $Ax = \lVert x\rVert A Q e_1$, from which $$ X = \frac{x^T (A^T A) x}{\lVert A^T A x \rVert \lVert x \rVert} = \frac{\lVert x\rVert^2 e_1^T W e_1}{\bigl\lVert \lVert x\rVert Q W e_1\bigr\rVert \lVert x\rVert} = \frac{e_1^T W e_1}{\lVert W e_1\rVert}. $$

The problem thus reduces to finding the ratio between then $(1,1)$ entry of a Wishart matrix and the norm of its first column.

There's probably many ways of doing this, but one is using a result from random matrix theory known as matrix tridiagonalisation (see here for a classic paper on the subject). The upshot in this case is that there is an orthogonal matrix $U$ whose first and column are both $e_1$ such that $$ W = U^T \tilde{W} U, $$ where $\tilde{W}$ is a matrix whose first column is $(A^2, AB, 0, \dotsc, 0)$, where $A$ and $B$ are independent Chi random variables with $A \sim \chi_d, B \sim \chi_{d-1}$.

But now, due to the special structure of $U$, we have that $$ \frac{e_1^T W e_1}{\lVert W e_1\rVert} = \frac{e_1^T U^T \tilde{W} U e_1}{\lVert U^T \tilde{W} U e_1\rVert} = \frac{e_1^T \tilde{W} e_1}{\lVert\tilde{W} e_1\rVert} = \frac{A^2}{\sqrt{A^4 + A^2 B^2}} = \biggl(\frac{A^2}{A^2 + B^2}\biggr)^{1/2}. $$

Now, since $A^2$ and $B^2$ are independent Chi squared random variables, there is a standard result that $$ X^2 = \frac{A^2}{A^2 + B^2} \sim \operatorname{Beta}\Bigl(\frac{d}{2}, \frac{d-1}{2}\Bigr). $$

The expectation of $X$ then follows from an elementary integration.