I am solving exercises from chapter 1 of the book called "Linear Algebra and Optimization for Machine Learning”. I have a serious problem with question 18 which states:
Consider a case where a $d × k$ matrix $P$ is initialized by setting all values randomly to
either $−1$ or $+1$ with equal probability, and then dividing all entries by $\sqrt{d}$. Discuss
why the columns of $P$ will be (roughly) mutually orthogonal for large values of $d$ of the
order of $10^6?$.
Could I get any help or hint how to approach that particular problem?
By the definition you give, for any two columns $C_1\ne C_2$ of the $d\times k$ matrix, you have $$C_1^TC_2 =\sum_{i=1}^dC_{1,i}C_{2,i}=\frac1 d\sum_{i=1}^d\varepsilon_{1,i}\,\varepsilon_{2,i}$$ Where all the $\varepsilon$ are independent Rademacher random variables (i.e. take value $+1$ or $-1$ with probability $1/2$).
This dot product looks an awful lot like a sample average, and indeed, if you can show that the terms $(\varepsilon_{1,i}\cdot\varepsilon_{2,i})$ are i.i.d and find their expectation, you can apply the law of large numbers and get your desired conclusion.