In Neural Networks initialization, we usually desire a random matrix $M$ such that both $\|Mx\|_2\approx \|x\|_2$ and $\|M^Ty\|_2\approx \|y\|_2$. In other words we'd like $M$ to have both unit length rows and columns.
One simple approach is to take $M_{i,j} \sim N(0,1)$ iid, and repeatedly normalize the rows and columns:
- Step 1: $M_i = M_i / \|M_i\|_2$ for all $i$.
- Step 2: $M_{*,j} = M_{*,j} / \|M_{*,j}\|_2$ for all $j$.
- Step 3: Repeat.
Ideally this would converge to an $M$ with the properties we desire.
This is naturally not possible for all matrices, take for example $M=[[1,1,1,1]]$ which will alternate between that value and $[[1/2,1/2,1/2,1/2]]$.
However, for square gaussian matrices, the procedure appears (experimentally) to always and quickly converge.
My question is: Can we prove the conjecture (that the procedure converges for square matrices)? Does the procedure converge to a distribution with a nice description? Has this been described previously in the literature of random matrices?