Mean and Std of of the dot product between any pair of columns

51 Views Asked by At

I am solving exercises from chapter 1 of the book called "Linear Algebra and Optimization for Machine Learning”. I have a serious problem with question 18 which states:
Consider a case where a $d × k$ matrix $P$ is initialized by setting all values randomly to either $−1$ or $+1$ with equal probability, and then dividing all entries by $\sqrt{d}$. Discuss why the columns of $P$ will be (roughly) mutually orthogonal for large values of $d$ of the order of $10^6?$.

Could I get any help or hint how to approach that particular problem?

2

There are 2 best solutions below

0
On

By the definition you give, for any two columns $C_1\ne C_2$ of the $d\times k$ matrix, you have $$C_1^TC_2 =\sum_{i=1}^dC_{1,i}C_{2,i}=\frac1 d\sum_{i=1}^d\varepsilon_{1,i}\,\varepsilon_{2,i}$$ Where all the $\varepsilon$ are independent Rademacher random variables (i.e. take value $+1$ or $-1$ with probability $1/2$).

This dot product looks an awful lot like a sample average, and indeed, if you can show that the terms $(\varepsilon_{1,i}\cdot\varepsilon_{2,i})$ are i.i.d and find their expectation, you can apply the law of large numbers and get your desired conclusion.

1
On

Hint: If $P_{i,k}$ denotes the $(i,j)$ entry of $P$ then notice that each $P_{i,k}$ is a random variable with distribution: $$ \mathbb{P}(P_{i,j} = ± 1/\sqrt{d}) = 1/2 $$ This is a Bernoulli random variable. Given $s \neq q$ and $s,q \in \{1,..k\}$ We are looking for the probablity: \begin{equation*} \tag{1} \mathbb{P} \left(\sum_{l=1}^d P_{l,s}P_{l,q} = 0 \right) \end{equation*} That is, we need to find the distribution of the product of Bernoulli random variables, and then the distribution of their sum. Given two Bernoulli, can you find the distribution of their product? Can you recognize the distribution of the sum of Bernoulli random variables?