Numerical errors in specifying covariance of two (or more) random signals

34 Views Asked by At

I am looking at conditioning randomly generated signals that produce very precise first and second order statistics.

Lets consider two randomly generated, 1-D signals $[X,Y]$ of arbitrary, but finite length. And we condition these signals to have zero mean and unit variance.

Lets begin with 100 elements, generated with some python code:

import numpy as np
np.random.seed(10101)

slen = 100 #signal length

#each column of x is a signal, X = x[:,0], Y = x[:,1]
x = np.random.uniform(low=-1,high=1,size=(slen,2))

#Zero mean
x = x-np.mean(x, axis=0)
#Unit variance
x = x/np.sqrt(np.mean(x**2,axis=0))

With zero mean and unit variance, we can create a covariance matrix $R_{ij}$ that specifies our desired variance and covariance between our two signals. Following a standard Cholesky decomposition to pre-multiply our signals by in order to give them the desired second order statistics. Just use some arbitrary values here.

$R_{ij}= \begin{bmatrix} 1 & 0.25 \\ 0.25 & 1 \end{bmatrix}$

$L_{ij}= \begin{bmatrix} 1 & 0 \\ 0.25 & 0.96824584 \end{bmatrix}$

Rij = np.array([[1,0.25],[0.25,1]])
Lij = np.linalg.cholesky(Rij)

We can now pre-multiply our signals by $L_{ij}$ and should return two signals with the given stats in $R_{ij}$.

$\begin{bmatrix} x'\\ y' \end{bmatrix} = L_{ij} \begin{bmatrix} x^{T}\\ y^{T} \end{bmatrix}$

xp = np.matmul(Lij, x.T).T

If we check the stats of the actual signal, we (not surprisingly) get something close-ish.

print('Signal stats')
print('R_xx = ', np.mean(xp[:,0]**2))
print('R_xy = ', np.mean(xp[:,0]*xp[:,1]))
print('R_yy = ', np.mean(xp[:,1**2]))

Resulting in

Signal stats
R_xx =  0.9999999999999996
R_xy =  0.19102763783107032
R_yy =  0.9705138189155358

Obviously $R_{xx}$ is dead on, and $R_{xy}$ and $R_{yy}$ are clearly in the ballpark, but not as accurate as most would like. (It turns out that it is ALWAYS $R_{yy}$ that is less accurate than $R_{xx}$, more on that later).

I am fairly sure this stems from the fact that our 'randomly' generated signal of just 100 elements is not statistically converged. I.e. the conditioned $X$ and $Y$ vectors do not have zero covariance as required by this Cholesky decomp technique to work. Indeed we can check this:

print('Random vector stats')
print('R_xy = ', np.mean(x[:,0]*x[:,1]))

yields

Random vector stats
R_xy =  -0.06090639375114042

We can improve these results by increasing the length of the originally random signals, so say, 10,000 elements, and the results do improve:

Signal stats
R_xx =  1.000000000000001
R_xy =  0.2632783322133913
R_yy =  1.0066391661066976

And the error will scale with the non-zero covariance error.

This leads to my first question:

Can I condition the original signals X and Y to truly have zero covariance and give a better level of accuracy in this procedure?

My second question comes from practical observation involving the exact same procedure with variable $R_{ij}$ values.

Swapping in a smaller $R_{xy}$ value to something like

$R_{ij}= \begin{bmatrix} 1 & 0.001 \\ 0.001 & 1 \end{bmatrix}$

We get the following results

Signal stats
R_xx =  0.9999999999999996
R_xy =  -0.059906363297935916
R_yy =  0.9998781872734045

So the accuracy of the procedure also depends on the level of covariance we attemp to set.

Playing around with values of $R_{xy}$ yields:

Smaller $R_{xy}$ yields more accurate $R_{yy}$

So my second question is:

What is the source of this pattern and can it be improved? (Preconditioning?)

Thanks for taking a look!

1

There are 1 best solutions below

0
On BEST ANSWER

Here is the best link to "de-correlating" or "whitening" the signals I have found

https://courses.media.mit.edu/2010fall/mas622j/whiten.pdf