Looking for an explanation over a standard deviation formula in C#

170 Views Asked by At

I found the following standard deviation formula by Victor Chen in a stackoverflow thread. I'm trying to understand why he calculates it like this: Math.Sqrt(sumOfDerivationAverage - (average*average));. The standard deviation is usually Math.Sqrt(variance), while his formula doesn't look like that.

private double getStandardDeviation(List<double> doubleList)  
{  
   double average = doubleList.Average();  
   double sumOfDerivation = 0;  
   foreach (double value in doubleList)  
   {  
      sumOfDerivation += (value) * (value);  
   }  
   double sumOfDerivationAverage = sumOfDerivation / (doubleList.Count - 1);  
   return Math.Sqrt(sumOfDerivationAverage - (average*average));  
}  

Formulas:

enter image description here

I'm trying to make a class that calculates the mean, the standard deviation and the mean absolute deviation. By looking at the formulas above, the variance in the code looks totally weird to me. If they calculate the variance like this, then how could the mean absolute deviation be calculated?

As far as I understand this part / _period) / (_period - 1) is adding more numbers after the floating point but it doesn't have any impact on the output. What about (_totalSquares - _totalAverage * _totalAverage) / _period?

Please explain why they calculate it that way.

public sealed class MovingAveragesHelper
{
    private readonly int _period;
    private readonly List<decimal> _values;

    private decimal _totalAverage;
    private decimal _totalSquares;
    private int _index;

    public decimal Average { get; private set; }
    public decimal MeanAbsoluteDeviation { get; private set; }
    public decimal Variance { get; private set; }
    public decimal StandardDeviation => (decimal)Math.Sqrt((double)Variance);
    public bool IsReady => _index >= _period;

    public MovingAveragesHelper(int period)
    {
        _period = period;
        _values = new List<decimal>();
    }

    public void Add(decimal input)
    {
        _totalAverage += input;
        _totalSquares += input * input;

        _values.Add(input);

        if (_index >= _period - 1)
        {
            Average = _totalAverage / _period;



            Variance = ((_totalSquares - _totalAverage * _totalAverage / _period) / (_period - 1));
            MeanAbsoluteDeviation = 0; // ???


            _totalAverage -= _values[_index - _period + 1];
            _totalSquares -= _values[_index - _period + 1] * _values[_index - _period + 1];
        }

        _index++;
    }
}
1

There are 1 best solutions below

0
On

My formula is straight forward. If we use $E[]$ to indicate the expectation value (note $\bar{x} = E[x]$), we get $$ Var[x_i] = E[(x_i - \bar{x})^2] = E[x_i^2 - 2\bar{x} x_i + \bar{x}^2] = E[x_i^2] - 2\bar{x} E[x_i] + \bar{x}^2 = E[x_i^2] - \bar{x}^2 $$ Now consider what happens if you take the sum into account. You obtain my formula, $$ E[\sum_i(x_i - \bar{x})^2] = \sum_i E[x_i^2] - N \bar{x}^2 $$ where I implicitly assumed that $i=1, \ldots, N$.