MLE for gaussian, finding $\mu$ and $\sigma^2$

3.8k Views Asked by At

"Assume that a dataset $x_1,\ldots, x_N$ consisting of $N$ points was sampled from a Gaussian distribution, i.e., $X_i \sim N(\mu; \sigma^2)$ for some unknown $- \infty < \mu < \infty$ and unknown $0 < \sigma^2 < \infty$. Also, assume that the $X_i$ are independent and identically distributed (iid). Find the maximum likelihood estimate of the Gaussian mean $\mu$ and variance $\sigma^2$ (and show that the critical point obtained is, at least, a local maximum)" -exercise $2.8$, A first course in machine learning, second edition.

I'm currently trying to solve the exercise above, however it's proving hard for me, nad i would love some help / a reference solution to the exercise.

So first i'll define the log-likelihood as: $$\log L = -\frac{N}{2}\log2\pi-N \log \sigma - \frac {1}{2 \sigma^2}\sum_{n=1}^N(t_n - w^Tx_n$$

I then find the derivative and set it equal to $0$ so:

$$\frac{\log L} w = \frac{1}{\sigma^2}\sum_{n=1}^N x_nt_n - x_n x_n^T w=0$$

which can then be rewritten as:

$$\frac{\log L} w = \frac 1 {\sigma^2}(X^T t-X^t Xw)=0,$$ and i can then solve for $w$ and get: $$w = (X^TX)^{-1} X^T t$$

After this point i get stuck, and unsure of how to find MLE for $\mu$ and $\sigma^2$, aswell as how to show that the critical point is at least a local maximum.

1

There are 1 best solutions below

5
On

I am a bit confused - why do you have $t_n, w$ and $x_n$? I think in your case, you don't need them as both $\mu$ and $\sigma^2$ are real numbers and not vectors - that is, you have normal, not multivariate distribution.

Anyway, you are not the right track:

$$\ell(\mu,\sigma^2)=-\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum(x_i-\mu)^2$$

Differentiate with respect to $\mu$ and set to $0$ (under the usual circumstances your MLE will satisfy this):

$$\frac{\sum(x_i-\hat\mu)}{\sigma^2}=0$$ which is equivalent to $$\hat\mu=\frac{\sum x_i}{n}=\bar x$$

To get the MLE for $\sigma^2$ differentiate with respect to $\sigma^2$, set to $0$ and plug $\hat\mu$ instead of $\mu$ for estimation to get:

$$-\frac{n}{2\sigma^2}+\frac{1}{2\sigma^4}\sum(x_i-\hat\mu)^2=0$$

which is equivalent to:

$$\hat\sigma^2=\frac{\sum(x_i-\hat\mu)^2}{n}$$

The answer you get is when you do linear modeling and you want to find out the MLE for the vector of coefficients, so I suppose you've come across those?