Normal Distribution from Standard Deviation?

724 Views Asked by At

So I have a data set $(x_{1},y_{1}), (x_{2},y_{2}),\dots,(x_{n},y_{n})$ and from it I have the values of $\sum x$, $\sum x^{2}$, $\sum y$, $\sum y^{2}$, $\sum xy$.

My question is, how do I find a normal distribution that best fits this data set and how do I use these values to calculate the standard deviation for the normal distribution?

Basically, given a data set, how do I find the values of the mean and standard deviation for the normal distribution of best fit? Are they the same as the mean of the data set?

2

There are 2 best solutions below

4
On

You need also $\sum x y$, otherwise you would exclude all the normal distributions where there is dependence between $X$ and $Y$.

The normal distribution that best fits the data is obtained by maximum likelihood estimation. It is the one that has the mean and covariance matrix equal to the empirical mean and empirical covariance matrix corresponding your sums (normalized by $n$).

2
On

You have the sufficient statistics for $\mu_X, \mu_Y, \sigma^2_X$ and $\sigma^2_Y$ so you can calculate their estimates directly using $$ \bar{x} = \frac{1}{n}\sum_{i = 1}^n x_i, \,\,\, \bar{y} = \frac{1}{n}\sum_{i = 1}^n y_i $$ for the sample means and $$ s^2_x= \frac{1}{n-1} \sum_{i=1}^n\left(x_i - \bar{x} \right)^ 2 = \frac{\sum_{i=1}^nx_i^2}{n-1} - \frac{n\bar{x}^2}{n-1} \\ s^2_y= \frac{1}{n-1} \sum_{i=1}^n\left(y_i - \bar{y} \right)^ 2 = \frac{\sum_{i=1}^ny_i^2}{n-1} - \frac{n\bar{y}^2}{n-1} $$ for the sample variances. As others have mentioned, without $\sum{xy}$ you will not be able to estimate the covariance between $X$ and $Y$, which the regression tag in your question suggests you want.