For $d\times d$ matrix $A$, $x\in \mathbb{R}^d$ with IID standard normal entries and $y=Ax$, I'm interested in the the following quantity where $\cos(u,v)$ refers to cosine similarity:
$$E_{A,x}[\cos(A^T y,A^{-1} y)]$$
It appears to be $\frac{1}{\sqrt{2}}$, in simulations, can this be proven?
Motivation: This justifies the use of transpose+line search to approximately solve linear equations
randn[dims__] := RandomVariate[NormalDistribution[], {dims}];
mat = randn[1000, 1000];
{d2, d1} = Dimensions[mat];
pinv[mat_] := Inverse[mat];
bs = 10000;
vecsIn = randn[bs, d1];
vecsOut = (mat . vecsIn\[Transpose])\[Transpose];
meanAlign[vec1_, vec2_] :=
Median@MapThread[(#1 . #2/(Norm[#1] Norm[#2])) &, {vec1, vec2}];
Print["average cosine similarity: ",
meanAlign[(mat\[Transpose] . vecsOut\[Transpose])\[Transpose],
vecsIn]]
Below is proof outline. Several places switch the order of expectation $f(E(x))=f(E(x))$ which was confirmed to hold in numeric simulations, but I'm curious how to show it more rigorously:
Given $x\sim \mathcal{N}(0, 1)$, define $y=Ax$ for invertible $d\times d$ matrix $A$ and consider 2 solutions of this system:
We seek to show that angle between two solutions $\cos \angle$ converges in probability to $\frac{1}{\sqrt{2}}$ where $$\cos \angle=\frac{\langle x, \hat{x}\rangle}{\|x\|\|\hat{x}\|}$$
and $A$ is a large matrix with IID Gaussian entries.
Proof parts:
\begin{equation} \cos \angle \xrightarrow{P} \sqrt{\frac{R(A)}{d}} \tag{0} \label{0} \end{equation}
Show that $R(A)\xrightarrow{P} d/2$ for large Gaussian matrix $A$
Combine these two to get convergence in probability with respect to both random $x$ and $A$
Part 1
We are interested in $E_x cos\angle$, and for that we distribute "expectation into the expression". Specifically, must show that
\begin{equation} E_x \cos\angle = E_x \frac{\langle x, \hat{x}\rangle}{\|x\|\|\hat{x}\|}\xrightarrow{P} \frac{E_x \langle x, \hat{x}\rangle}{\sqrt{E_x\|x\|^2}\sqrt{E_x \|\hat{x}\|^2}} \tag{1}\label{1} \end{equation}
The last equality appears to hold in simulations, why?
Earlier discussion on mathoverflow gave this hint:
Now we can analyze three expectations separately:
Substituting 1,2,3 into $\eqref{1}$ and applying definition of $R(A)$ gets intended result
Proofs of 1,2,3 use trace cyclical property + linearity of trace/expectation which let us switch their order: $E_x\|x\|^2=E \operatorname{Tr} x^T x=E \operatorname{Tr} x x^T =\operatorname{Tr} E x x^T =\operatorname{Tr} I=d$
Part 2
When $A$ is random matrix with IID standard normal entries we have
If we can somehow use concentration to change the order of expectation and division, we would get the following
$$R(A)=\frac{\|A\|_F^4}{\|AA^T\|_F^2}\xrightarrow{P} \frac{(E\|A\|_F^2)^2}{E\|AA^T\|_F^2}=\frac{d^4}{d^2+2d^3}$$
Plugging this into $\eqref{0}$ we get
$$E_x[\cos \angle]\xrightarrow{P} \frac{d^4}{d^3+2d^4}= \sqrt{\frac{1}{2}}$$