How to compute total covariance matrix from two separated datasets x and y

209 Views Asked by At

I have two datasets x (mxn) and y (mxq). Where m are the features that are shared by both datasets and n, q the samples that are different.

I want to find a way to compute the covariance matrix as if the two datasets were bind by columns and not separated.

Let D be the resultant matrix from binding columns of x with columns of y and with dimension m x (n+q)):

\begin{bmatrix}x&y\end{bmatrix}

Due to privacy issues I can't merge x and y together to obtain D matrix, furthermore I cannot use a function that asks as input both x and y.

One possible solution is to compute the cov(D) matrix from the cov(x) and cov(y) as block matrix in the form:

\begin{bmatrix}cov(x)&cov(x,y)\\cov(y,x)&cov(y)\end{bmatrix}

The problem with this solution is that the cov(x,y) cannot be obtained without knowing original x and y datasets.

Summing up I want to know if there is a way to build cov(D) from a function taking in input cov(x) and cov(y) but not cov(x,y).

Hope to be clear

1

There are 1 best solutions below

3
On

Covariance depends not merely on the variance of the individual input sets but also the order they appear in. If it is always true that $x_a > x_b$ implies $y_a > y_b$, then the covariance reaches a maximum; if on the other hand $x_a > x_b$ implies the opposite, that $y_a < y_b$, then the covariance is exactly the negative that of the first case. Other orders produce a whole range of values in between.

As a visual example, here are three data sets, using values from $-4$ to $4$ for both $x$ and $y$. in the first, $x_a = y_a$ so the covariance is as positive as possible, $20/3$, which is equal to the variance of each of the two input sets; in the second, $x_a = -y_a$ so the covariance is as negative as possible, $-20/3$; in the third, the permutation is shuffled in a way that gives a covariance of exactly $0$.

enter image description here