Working out Vector Variances using index notation

100 Views Asked by At

I was hoping someone could provide some intuition to some proofs of vector variances. The expectation case seems rather simple but I get confused when trying to work it out for the variance.

In the case of $$Var_x(a^T x) = \int \sum_i a_i (x_i - \bar x_i) \sum_j a_j(x_j - \bar x_j)p(x_i, x_j) d(\bar x)$$

$$= \sum_i \sum_j a_i a_j\int (x_i - \bar x_i)(x_j - \bar x_j)p(x_i, x_j) d(\bar x)$$

The result is $$a^TV_x(x)a$$ butI do not quite see how this works?

Similarly for the matrix for $$Var_x(Ax)$$ I don't know how to get to the result $$A \Sigma A^T$$

I would really appreciate if someone could explain how to see the transposes, specifically for index case as that is the method we are following in our class.

Thanks

2

There are 2 best solutions below

0
On

You can write $Ax = \sum_i a_i x_i$, where $a_i$ are columns of $A$ and $x_i$ is the $i^{th}$ component of $x$. This means $Var(Ax) = Var(\sum_i a_i x_i) = \sum_i \sum_j a_ia_jcov(x_i, x_j) = A\Sigma A^T$, where $\Sigma$ is the variance-covariance matrix of $x$.

0
On

How well-versed are you in linear algebra ? What follows isn't a direct answer to your question (since balaji's answer seems satisfactory on the technical front), but understanding analogies between normal vectors and random variables should help you greatly when trying to figure out the role of the various pieces in what you're trying to prove, IMO.

Here is a series of analogies:

  • A random variable is a function. Function spaces are vector spaces. Random variables are thus a form of vector.

  • The covariance of two random variables is a dot product between vectors.

  • The variance of a random variable is the covariance of a random variable with itself. This makes it "the dot product of a vector with itself". This is the norm squared (also called a quadratic norm).

  • The standard deviation is the square root of the variance. The square root of a quadratic norm is a "simple" norm. The standard deviation is thus the vector norm of a random variable.

  • The Pearson correlation coefficient is defined as $\frac{cov(X,Y)}{\sigma_X \sigma_Y}$, which corresponds analogically to the cosine vector formula $\frac{<u,v>}{|u||v|}$. The Pearson correlation coefficient is a cosine (ie, perfect colinearity/correlation for cosine 1, no correlation for cosine 0, anticorrelation for cosine -1).

  • A random vector is a list of random variables. A matrix is a list of column vectors, representing the "target basis" of a linear transformation. When multiplying two matrices together, each coordinate of the resulting product is defined as the dot product of a line of the left operand with a column of the right operand. When multiplying two random vectors, this dot product is covariance.

  • This makes the covariance matrix of a random vector the matrix product of a random vector with its (conjugate) transpose (which is an involution); what is usually called the gramian matrix in linear algebra. The gramian is a sort of "squared" matrix, that generalizes to non-square matrices. I say "sort of" squared, because the product of a matrix with its transpose $A^T A$ is not the same as $A^2$. To understand this difference, a much simpler example is $z^2$ (normal square) vs $z \bar{z} = |z^2|$ ("sort of" square) in the complex numbers. [As a side note, the relation between a matrix, its Frobenius norm and its gramian seem relevant to understanding this analogy further, but I haven't completely figured it out myself.]

  • The gramian is a symmetric matrix, and symmetric matrices $S$ represent dot products in the form $v^T S v$. The covariance matrix can thus be used as a form of dot product at the "random vector" level, rather than the "random variable" level.

  • SVD is just an improved Jordan decomposition of the gramian, and PCA is its equivalent on a covariance matrix.

PS: I think expectancy can be linked to covectors of probability spaces in some way, given that dot products of function spaces are often of the form $<f, g> = \int f(t)g(t)dt$; but I haven't figured that one out either completely. If someone more savvy than myself can help further this series of analogies, they are more than welcome to do so. I will add their contribution to this post if it is worthwhile.