If $C = \frac{1}{2}\sum_{j}(y_j - a_{j}^{L}) ^2$ then why is: $\frac{\partial C}{\partial a_{j}^{L}} = (a_{j}^{L} - y_j)$?

20 Views Asked by At

Around the phrase in the book of http://neuralnetworksanddeeplearning.com/chap2.html

which obviously is easily computable.

There is $C = \frac{1}{2}\sum_{j}(y_j - a_{j}^{L}) ^2$

Then why is: $\frac{\partial C}{\partial a_{j}^{L}} = (a_{j}^{L} - y_j)$?

I thought it would be: $\frac{\partial C}{\partial a_{j}^{L}} = (y_j - a_{j}^{L} )$

The answer in the book would flip the sign, wouldn't it?

Is the flip between $a_{j}^{L}$ and $y_j$ a typo or intentional?

1

There are 1 best solutions below

1
On BEST ANSWER

The book is correct. When you do the chain rule, you need to multiply by the derivative of $y-a$ with respect to $a$ which is $-1$. So the derivative is $-(y-a) = a-y$