I am looking at conditioning randomly generated signals that produce very precise first and second order statistics.
Lets consider two randomly generated, 1-D signals $[X,Y]$ of arbitrary, but finite length. And we condition these signals to have zero mean and unit variance.
Lets begin with 100 elements, generated with some python code:
import numpy as np
np.random.seed(10101)
slen = 100 #signal length
#each column of x is a signal, X = x[:,0], Y = x[:,1]
x = np.random.uniform(low=-1,high=1,size=(slen,2))
#Zero mean
x = x-np.mean(x, axis=0)
#Unit variance
x = x/np.sqrt(np.mean(x**2,axis=0))
With zero mean and unit variance, we can create a covariance matrix $R_{ij}$ that specifies our desired variance and covariance between our two signals. Following a standard Cholesky decomposition to pre-multiply our signals by in order to give them the desired second order statistics. Just use some arbitrary values here.
$R_{ij}= \begin{bmatrix} 1 & 0.25 \\ 0.25 & 1 \end{bmatrix}$
$L_{ij}= \begin{bmatrix} 1 & 0 \\ 0.25 & 0.96824584 \end{bmatrix}$
Rij = np.array([[1,0.25],[0.25,1]])
Lij = np.linalg.cholesky(Rij)
We can now pre-multiply our signals by $L_{ij}$ and should return two signals with the given stats in $R_{ij}$.
$\begin{bmatrix} x'\\ y' \end{bmatrix} = L_{ij} \begin{bmatrix} x^{T}\\ y^{T} \end{bmatrix}$
xp = np.matmul(Lij, x.T).T
If we check the stats of the actual signal, we (not surprisingly) get something close-ish.
print('Signal stats')
print('R_xx = ', np.mean(xp[:,0]**2))
print('R_xy = ', np.mean(xp[:,0]*xp[:,1]))
print('R_yy = ', np.mean(xp[:,1**2]))
Resulting in
Signal stats
R_xx = 0.9999999999999996
R_xy = 0.19102763783107032
R_yy = 0.9705138189155358
Obviously $R_{xx}$ is dead on, and $R_{xy}$ and $R_{yy}$ are clearly in the ballpark, but not as accurate as most would like. (It turns out that it is ALWAYS $R_{yy}$ that is less accurate than $R_{xx}$, more on that later).
I am fairly sure this stems from the fact that our 'randomly' generated signal of just 100 elements is not statistically converged. I.e. the conditioned $X$ and $Y$ vectors do not have zero covariance as required by this Cholesky decomp technique to work. Indeed we can check this:
print('Random vector stats')
print('R_xy = ', np.mean(x[:,0]*x[:,1]))
yields
Random vector stats
R_xy = -0.06090639375114042
We can improve these results by increasing the length of the originally random signals, so say, 10,000 elements, and the results do improve:
Signal stats
R_xx = 1.000000000000001
R_xy = 0.2632783322133913
R_yy = 1.0066391661066976
And the error will scale with the non-zero covariance error.
This leads to my first question:
Can I condition the original signals X and Y to truly have zero covariance and give a better level of accuracy in this procedure?
My second question comes from practical observation involving the exact same procedure with variable $R_{ij}$ values.
Swapping in a smaller $R_{xy}$ value to something like
$R_{ij}= \begin{bmatrix} 1 & 0.001 \\ 0.001 & 1 \end{bmatrix}$
We get the following results
Signal stats
R_xx = 0.9999999999999996
R_xy = -0.059906363297935916
R_yy = 0.9998781872734045
So the accuracy of the procedure also depends on the level of covariance we attemp to set.
Playing around with values of $R_{xy}$ yields:
Smaller $R_{xy}$ yields more accurate $R_{yy}$
So my second question is:
What is the source of this pattern and can it be improved? (Preconditioning?)
Thanks for taking a look!
Here is the best link to "de-correlating" or "whitening" the signals I have found
https://courses.media.mit.edu/2010fall/mas622j/whiten.pdf