I am currently optimizing some code and thus, I want to replace an inefficient OpenCV function, which calculates a covariance matrix. The thing is, that I only need the trace of this covariance matrix, as such, I only need the variance, if I am not mistaken.
Well, my main problem is, that each element has an x, y and z coordinate and thus, 3 dimensions. So, my data looks like:
[x_1, y_1, z_1], [x_2, y_2, z_2], ... [x_n, y_n, z_n]
How can I calculate the variance of this 3D data in general - and how could I optimize it to run as fast as possible?
Thanks in advance!
Generally the variance of a random vector $X\in\mathbb R^m$ is $$ \Sigma = \operatorname{var}(X) = \operatorname{E}( (X-\mu)(X-\mu)^T) \in\mathbb R^{m\times m}\quad\text{where } \mu=\operatorname{E}(X)\in\mathbb R^m. $$ Thus $\Sigma$ is a symmetric non-negative-definite matrix. Feller's book calls this the variance; some others call it the "covariance matrix" since its entries are the covariances between components of $X$. Some of its properties are similar to those of variances of scalar-valued random variables; for example if $A\in\mathbb R^{\ell\times m}$ so that $AX$ is an $\ell\times 1$ random column vector, then $\operatorname{var}(AX) = A\Sigma A^T\in\mathbb R^{\ell\times\ell}$.
You're talking about estimating $\Sigma$ based on a sample of size $n$, with $m=3$. Regarding your data points as column vectors $\begin{bmatrix} x_k \\ y_k \\ z_k \end{bmatrix}$, for $k=1,\ldots,n$, we can let $\begin{bmatrix} \bar x \\ \bar y \\ \bar z \end{bmatrix}$ and then look at the matrix $$ M = \begin{bmatrix} \ldots, & x_k - \bar x, & \ldots \\ \ldots, & y_k - \bar y, & \ldots \\ \ldots, & z_k - \bar z, & \ldots \end{bmatrix} \in \mathbb R^{3\times n}. $$ Then $$ S = \frac 1 n M M^T \in \mathbb R^{3\times 3} $$ is the maximum-likelihood estimator of $\Sigma$ if the $3\times 1$ column vectors are from a $3$-dimensional normal distribution. The matrix $$ \tilde S = \frac 1 {n-1} M M^T \in \mathbb R^{3\times 3} $$ is an unbiased estimator of $\Sigma$ under far weaker assumptions.