Question about information loss and projection

116 Views Asked by At

I'm trying to clarify somethings about information theory and dimension reduction.
Now consider a random vector $x \in \mathbb{R}^n$, we use a set of unit vector $a_i$'s $\in \mathbb{R}^n$, $i \in \{1,2,\ldots d\}$ $(d<n)$ to project $x$, i.e. $s = x^T[a_1,a_2,\ldots,a_d]$
Our target is to preserve information as much as possible, if angle of $a_1$ and $a_2$ is close to 0 or 180 degree (or $a_1^Ta_2 $ is close to 1 or -1), $x^Ta_2$ may contain most of information $x^Ta_1$, so we can imagine intuitively we need $a_i$'s to be pairwise-orthogonal.
If $a_1^Ta_2=0$, we would not have any redundant information of $x^Ta_1$ in the representation of $x^Ta_2$
Is there any mathematical theorem based on some materials in information theory like entropy, mutual information to support that if we want to preserve most information, the basis we select should be orthogonal?
Any help would be appreciated!