I have been trying to solve this problem from Bishop's Machine Learning chapter 5 for the past few hours, but I am confused as to how to show the below identity. I know I have to take a partial derivative wrt $a_k$ but I don't know how.
Show that the derivative of E(w) $$E\left(w\right)=-\sum _{g=1}^G\:\left(t_g\cdot ln\left(y\left(a_g\right)\right)+\:\left(1-t_g\right)\cdot ln\left(1-y\left(a_g\right)\right)\right) $$ wrt $a_g$ for output having logistic sigmoid activ. function where $y\left(a_g\right)=σ\left(a_g\right)$ satisfies
$$ \frac{\partial E}{\partial a_k g}=\left(y\left(a_g\right)-t_{g\:}\right),\:and\:given\:\frac{\partial σ\left(a_g\right)\:}{\partial \:a_g}=σ\left(a_g\right)\left(1-σ\left(a_g\right)\right) $$
First differentiating the equation for $E$ with respect to $a_k$ (I am assuming $t_g$ is a constant): $$\frac{\partial E}{\partial a_k} = -\frac{\partial }{\partial a_k} \sum_g (t_g \cdot ln(y(a_g)) +(1-t_g)\cdot ln(1-y(a_g))) $$
$$ = -\sum_g \left[t_g \frac{y'(a_g)\delta_{gk}}{y(a_g)} + (1-t_g)\frac{-y'(a_g)\delta_{gk}}{1-y(a_g)}\right] $$
where we have used the chain rule and $\frac{\partial a_g}{\partial a_k} = \delta_{gk}$ where $\delta$ is the Kronecker delta. Now evaluating the above sum becomes simple:
$$ \frac{\partial E}{\partial a_k} = - t_k\frac{y'(a_k)}{y(a_k)} + (1-t_k)\frac{y'(a_k)}{1-y(a_k)} $$
From here on I will drop the $k$ subscript to reduce clutter. Now we can plug in the property you gave for the sigmoid:
$$y'=y(1-y)$$
$$\implies \frac{\partial E}{\partial a_k} = -t \frac{y'}{y} + (1-t)\frac{y'}{1-y}$$ $$ = -t \frac{y(1-y)}{y}+(1-t)\frac{y(1-y)}{1-y}$$ $$ = -t(1-y) + (1-t)y$$
Finally, reintroducing the subscripts, we have
$$\frac{\partial E}{\partial a_k} = y(a_k) - t_k$$