Measuring Variance of 2D Data Points

7.5k Views Asked by At

I have a list of weighted data points on a 2D plane in the form $(x, y)$. I believe the mean can calculated as $\left(\frac{\sum{x_i\times w_i}}{\sum w_i}, \frac{\sum{y_i\times w_i}}{\sum w_i}\right)$.

What is the best way to calculate a single value which will accurately describe the spread of the data points?

2

There are 2 best solutions below

1
On BEST ANSWER

You have computed the weighted mean $$\mathbf{\bar x}_* = \sum w_i\mathbf{x}_i$$ (assume $\sum w_i = 1$ for ease of notation). Then why not use weighted variance $$\sum w_i|\mathbf{x}_i -\mathbf{\bar x}_*|^2$$ or, for an unbiased version, use the unbiased weighted variance $$\frac{1}{1-\sum w_i^2}\sum w_i|\mathbf{x}_i -\mathbf{\bar x}_*|^2$$

Here, $|\cdot|$ denotes the ordinary Euclidean norm $|\mathbf u| = \sqrt{u_1^2+u_2^2}$, so $|\mathbf{ u }|^2 = u_1^2+u_2^2$.

2
On

This expands on the comment by Dr_Zaszuś:

Generally, $$V(X + Y) = V(X) + V(Y) + \text{Cov}(X, Y) + \text{Cov}(Y, X)$$

However, according to Definition 6.10 in Mathematics for Machine Learning, if $X$ and $Y$ are independent random variables, then you can define the variance, denoted $V$, of $X$ and $Y$ as,

$$V(X + Y) = V(X) + V(Y)$$