Derivative with Chain Rule of Sum of Squares Error Function when fitting the data using Mth order polynomial

664 Views Asked by At

This is from the very first exercise of Chris Bishop's Pattern Recognition and Machine Learning. The questions asks us to show how minimizing the Sum-of-Squares Error Function through the weights w is the same as solving a linear set of equations he presents to us. I understand we must take the derivative of the error function with respect to w, set it to zero, and then inevitably expand it out and show it takes the form of the linear set of equations he presents. Note that $y(x_n, \pmb{w}) = w_0 + w_1 x + w_2 x^2 ...w_{M}x^M = \displaystyle \sum_{j=0}^{M} w_jx^j$

However, I am stuck on the Chain rule portion of taking the derivative of $E(w) = \frac{1}{2} \sum_{n=1}^{N} \{ y(x_n,\pmb{w}) - t_n \}^2$.

Right now, I have...

\begin{align} \frac{\partial E(\pmb{w})}{\partial \pmb{w}} = \frac{\partial}{\partial \pmb{w}} \frac{1}{2} \displaystyle \sum_{n=1}^{N} \{ y(x_n, \pmb{w}) - t_n \}^2 \end{align}

We can push the derivative through the sum, since the sum of the derivative is equivalent to the derivative of the sum. \begin{align*} & = \frac{1}{2} \displaystyle \sum_{n=1}^{N} \frac{\partial}{\partial \pmb{w}} \{ y(x_n, \pmb{w}) - t_n \}^2 \end{align*} By the chain rule. \begin{align*} & = \frac{1}{2} \displaystyle \sum_{n=1}^{N} 2 \{ y(x_n, \pmb{w}) - t_n \} (\frac{\partial y(x_n, \pmb{w})}{\partial \pmb{w}}) \end{align*}

Now my issue is evaluating $(\frac{\partial y(x_n, \pmb{w})}{\partial \pmb{w}})$. This is equivalent to

\begin{align} \frac{\partial}{\partial \pmb{w}} \displaystyle \sum_{j=0}^{M} w_jx^j \end{align}

Which I see as equalling

\begin{align} = \sum_{j=1}^{M} x^j \end{align}

So then, the derivative of the Error function would be

\begin{align} \frac{1}{2} \displaystyle \sum_{n=1}^{N} 2 \{ y(x_n, \pmb{w}) - t_n \} (\sum_{j=1}^{M} x^j) \end{align}

However, all the solutions I've seen have the derivative of the Error function as being \begin{align} \frac{1}{2} \displaystyle \sum_{n=1}^{N} 2 \{ y(x_n, \pmb{w}) - t_n \} x_n^i \end{align}

Which does result in the correct answer. So my question is, where did I go wrong?

Thank you in advance.

1

There are 1 best solutions below

2
On

Notice that $w$ here is a vector and hence the outcome is going to be a vector.

For example, let $w \in \mathbb{R}^2$, that is $M=1$.

$$\frac{\partial}{\partial w_0}\left( w_0+w_1x\right)=1=x^0 \ne x^0+x^1$$

$$\frac{\partial}{\partial w_1}\left( w_0+w_1x\right)=x=x^1 \ne x^0+x^1$$

Edit:

Let $A=\begin{bmatrix}1 & x_1^1 & \ldots & x_1^M\\\vdots & \vdots &\ddots & \vdots\\ 1 & x_N^1 & \ldots & x_N^M \end{bmatrix}$, $w= \begin{bmatrix}w_0 \\ \vdots\\ w_M \end{bmatrix}$, and $t=\begin{bmatrix} t_1 \\ \vdots\\ t_N \end{bmatrix}$.

$$E(w)=\frac12 \|Aw-t\|^2$$

$$\nabla E(w) = A^T(Aw-t)=0$$

$$A^TAw = A^Tt$$