Pseudo-whitening problem of matrix-based data

75 Views Asked by At

Whitening: When you have a set of vectorial observations, i.e. $\{\boldsymbol{v}_i\in\mathbb{R}^n:i\in[1,N]\}$ you can standardize and decorrelate each dimension $d\in[1,n]$ using Cholesky-whitening:

$$\boldsymbol{v}_i^*:=\boldsymbol{L}^{-1}\boldsymbol{v}_i$$

where $\boldsymbol{L}^T\boldsymbol{L}=\boldsymbol{C}$ and $\boldsymbol{C}\in\mathbb{R}^{n\times n}$ is the covariance matrix of your data. This way the covariance matrix of the dataset $\{\boldsymbol{v}_i^*\}$ becomes the identity matrix.

The problem: I have ran into a somewhat similar problem, but "reversed": given a set of matrices $\{\boldsymbol{A}_i\in\mathbb{R}^{n\times m}\}$ we are searching for a vector $\boldsymbol{v}$ such that the resulting dataset after the transormation $\{\boldsymbol{v}_i^*:=\boldsymbol{A}_i\boldsymbol{v}\}$ has an identity covariance matrix (if it's possible) or minimizes the expression $(\boldsymbol{C}-\boldsymbol{I})^2$ (if it's not possible). How can we find this vector (or one of the vectors, if there are multiple)?

My thoughts so far: I can take a random vector $\boldsymbol{v}$, create a dataset with that, and do the whitening transformation on these vectors: $\{\boldsymbol{v}_i^*:=\boldsymbol{L}^{-1}\boldsymbol{A}_i\boldsymbol{v}\}$. However, I suspect that the results will differ for different $\boldsymbol{v}$ values, since the matrices $\boldsymbol{A}_i$ and $\boldsymbol{L}^{-1}$ do not necessarily commute (or do they?).

Another approach I thought about was to transform this back to the original problem: for each expression $\boldsymbol{A}_i\boldsymbol{v}$ there is a similarly formulated matrix mutliplication such that $\boldsymbol{A}_i\boldsymbol{v}=\boldsymbol{V}\boldsymbol{a}_i$, where the matrix $\boldsymbol{V}$ contains the values of $\boldsymbol{v}$ (and zeros), and the vector $\boldsymbol{a}_i$ is the "flattened" version of the matrix $\boldsymbol{A}_i$. This is fine, until I realized that $\boldsymbol{V}$ is not a square matrix, so it can't be the result of the Cholesky-decomposition of the inverse covariance matrix.

Edit: Under the term "covariance matrix of a dataset $\{\boldsymbol{v}_i\in\mathbb{R}^n:i\in[1,N]\}$" I mean the following:

$$(\boldsymbol{C})_{jk}:=\frac{1}{N}\sum_{i=1}^N(v_{ij} - \mu_j)(v_{ik} - \mu_k)$$

where $v_{ij}$ is the value at the jth dimension of the ith vector in the dataset, and $\mu_j$ is the average of the jth dimension values along the dataset.

1

There are 1 best solutions below

0
On

Ok, so I may have solved the problem to some extent. Here is what I figured out:

The matrix multiplication $\boldsymbol{A}_i\boldsymbol{v}$ is the linear combination of the colums of the matrix $\boldsymbol{A}_i$, i.e. $\boldsymbol{A}_i\boldsymbol{v}=\sum_{j=1}^{m}v_j\boldsymbol{a}_{ij}$. So the original question (with a bit of a reformulation) is that what vector $\boldsymbol{v}$ will minimize the expression $$ \left\|\text{cov}_i\left(\boldsymbol{A}_i \boldsymbol{v}\right)-\boldsymbol{I}\right\|^2 = \left\|\text{cov}_i\left(\sum_{j=1}^{m}v_j\boldsymbol{a}_{ij}\right)-\boldsymbol{I}\right\|^2$$ where $\text{cov}_i(\boldsymbol{x}_i)$ means the sample covariance matrix between the dimensions of the $\boldsymbol{x}_i$ vectors calculated along the "sampling axis" i. Here, I assume that the column vectors $\boldsymbol{a}_{ij}$ have a joint Gaussian distribution: $$ \begin{bmatrix} \boldsymbol{a}_{i1} \\ \boldsymbol{a}_{i2} \\ \vdots \\ \boldsymbol{a}_{im} \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \\ \vdots \\ \boldsymbol{\mu}_m \end{bmatrix} , \boldsymbol{C}_\text{full}:= \begin{bmatrix} \boldsymbol{C}_{11} & \boldsymbol{C}_{12} & \cdots & \boldsymbol{C}_{1m} \\ \boldsymbol{C}_{21} & \boldsymbol{C}_{22} & \cdots & \boldsymbol{C}_{2m} \\ \vdots & \vdots & \ddots & \vdots\\ \boldsymbol{C}_{m1} & \boldsymbol{C}_{m2} & \cdots & \boldsymbol{C}_{mm} \\ \end{bmatrix} \right) $$ (Note that the distribution does not depend on the index i, since it is the "sampling" axis.) Then, following the answer here we know that $$\sum_{j=1}^{m}v_j\boldsymbol{a}_{ij} \sim \mathcal{N} \left( \sum_{j=1}^{m}v_j\boldsymbol{\mu}_j , \sum_{j=1}^m \sum_{k=1}^m v_j v_k \boldsymbol{C}_{jk} \right)$$ Now, we have a formula for $\text{cov}_i\left(\boldsymbol{A}_i \boldsymbol{v}\right)$ in the following form: $$\text{cov}_i\left(\boldsymbol{A}_i \boldsymbol{v}\right) = \sum_{j=1}^m \sum_{k=1}^m v_j v_k \boldsymbol{C}_{jk}$$ From this I can probably do gradient descent to find the optimal values for $v_j$ minimizing the squared error above. However, if somebody has an exact solution, I am more than happy to hear that! Also, if I'm wrong, please correct me!