Pattern Recognition and Machine Learning: Maximizing log likelihood with respect to Beta

113 Views Asked by At

On page 29 in Christopher Bishop's Pattern Recognition and Machine Learning book he gives the following two equations

1.62

$$ \ln p(t|\pmb{x}, \pmb{w}, \beta) = - \frac{\beta}{2} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 + \frac{N}{2} \ln \beta - \frac{N}{2} \ln(2\pi) $$

He then goes on to explain that maximizing this with respect to w would result in the following by basically removing everything that is not dependent on $w$ which makes sense...

$$ \pmb{w}_{ML} = \frac{1}{2} \sum_{n=1}^N \{ y(x_n, w) - t_n \}^2 $$

Immediately following this the claim is made that maximizing with respect to $\beta$ gives the following...

$$ \frac{1}{\beta_{ML}} = \frac{1}{N} \sum_{m=1}^N \{ y(x_n, w) - t_n \}^2 $$

and I can't quite get there by the same logic, how would you arrive at this?

1

There are 1 best solutions below

1
On BEST ANSWER

I was wrongly assuming that the $\frac{N}{2} \ln \beta$ was part of the summand. Taking the derivative and setting it to 0 gave me the proper outcome.

$$ \begin{aligned} &\frac{\partial}{\partial \beta} \Big\lbrack -\frac{\beta}{2} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 + \frac{N}{2} \ln \beta - \frac{N}{2} \ln(2\pi) \Big\rbrack \\ &\frac{\partial}{\partial \beta} \Big\lbrack -\frac{\beta}{2} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 \Big\rbrack + \frac{\partial}{\partial \beta} \Big\lbrack \frac{N}{2} \ln \beta \Big\rbrack \\ 0 = &-\frac{1}{2} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 + \frac{N}{2\beta}\\ -\frac{N}{2\beta} = &-\frac{1}{2} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 \\ \frac{1}{2\beta} = &\frac{1}{2N} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 \\ \frac{1}{\beta} = &\frac{1}{N} \sum_{n=1}^N \{ y(x_n, \pmb{w}) - t_n \}^2 \\ \end{aligned} $$