Consider the following formula:
$$E(\mathbf{w}) = \frac{1}{2}\sum_{n=1}^{N}\{y(x_n,\mathbf{w})-t_n\}^2$$
where $\mathbf{w}$ is a vector of weights; $x_n$ and $t_n$ come from two vectors of length $N$; and $y$ is a polynomial:
$$y(x,\mathbf{w}) = \sum_{j=0}^M w_jx^j$$
My task is to show a system of equations which yield weights $\mathbf{w} = \{w_i\}$ that minimize E. I reckoned I should differentiate and set the derivative to 0:
$$ \frac{dE}{dw} = \sum_{n=1}^{N}\{y(x_n,\mathbf{w})-t_n\}\times\frac{dy}{dw}$$
$$ \frac{dE}{dw} = \sum_{n=1}^{N}\{y(x_n,\mathbf{w})-t_n\}\times\sum_{i=0}^{M}x_n^j$$
$$\sum_{n=1}^{N}\{\sum_{j=0}^{M}w_jx_n^j-t_n\}\times\sum_{i=0}^{M}x_n^j = 0$$
The solution says to do what I did, except differentiate "with respect to $w_i$". It offers the following expression:
$$\sum_{n=1}^{N}(\sum_{j=0}^{M}w_jx_n^j-t_n) x_n^i = 0$$
I take it that each $i$ yields another equation, hence this approach leading to a system of equations. There are two things I don't understand:
Why is there not a summation over the $x_n^i$ values at the end? I thought differentiating $y$ would remove the weights but retain the summation.
The inner summation uses a $j$ though the outer uses a $i$. Why are they not the same symbol? Though I know if they were both $j$ we would be left with just one equation, I don't understand how they can be different.
You're minimizing a function $E(\mathbf{w})$, where $\mathbf{w}$ is a vector, presumably of size $N$. You can't just treat $\mathbf{w}$ as a variable $w$ and differentiate with respect to it: it's a vector. So instead you need to take partial derivatives for each $w_i$ and set them equal to 0 (Or equivalenty, you're looking for solutions $\nabla E(\mathbf{w})=0$, where the gradient is with respect to $w_1,\cdots,w_N$). When $i\leq M$, you have
$$\frac{\partial y(x,\mathbf{w})}{\partial w_i}=x^{i},$$
and if $i>M$, then the derivative is just 0. So by the chain rule,
$$0=\frac{\partial E(\mathbf{w})}{\partial w_i}=\sum_{n=1}^N(y(x_n,\mathbf{w})-t_i)x_n^{i}.$$
Now plug in the definition of $y$.