In short
I have 2 columns of data; quantity and time taken for that quantity of items to get from point A to B.
e.g.
| Quantity | Travel time (Days) |
|---|---|
| 725 | 10 |
| 583 | 12 |
| 609 | 11 |
| 23 | 2 |
| 447 | 13 |
| 14 | 28 |
I want to find a weighted standard deviation for this data, based on the quantities.
Intuition
The rows which have large quantities are pretty consistent, while those with small quantities vary more wildly. Therefore I want to use the low quantities to negate the effect of their respective travel time.
My approach
Let $Q_i$ and $T_i$ represent the quantity and travel time of row $i$, respectively, and assume I have $n$ data points.
Let
$$Q = \sum_{i=1}^{n}Q_i.$$
Then I calculate a weighted average
$$E(T) = \sum_{i=1}^{n}\frac{Q_i}{Q}\cdot T_i.$$
A naive calculation for standard deviation would be
$$\sigma_T = \sqrt{\sum_{i=1}^{n}\frac{Q_i}{Q}\big(T_i - E(T)\big)^{2}}$$
Applying these calculations, once to the entire dataset and a second time to the dataset with the anomalous values removed, I get the following:
As you can see, the weighted mean is quite stable, whereas the weighted standard deviation varies wildly; from $1.91$ to $1.10$.
I cannot find how to attach the .xlsx file, but if needed, I can make a download link.
I need to manipulate my calculation such that the values in F5 and F7 on the first screenshot are decreased.
One approach could be to utilize something like $E(T)^{(1-w)}$ or $E(T)^{(1-w)^{\frac{1}{2}}}$ ($w$ being the weight of a datapoint); for weights close to $0$, it would be close to the weighted average (thus decreasing their impact on the standard deviation) and for weights close to $1$, this values would also be close to $1$ (thus meaning it would have a small effect).
The problem is that any manipulation that I apply takes away from this value being the true standard deviation - i.e. I wouldn't be able to confidently say that 68% of my data falls in the range bound by the standard deviation.
Any help is greatly appreciated!
