Given a softmax function: $y_i = \frac{e^{z_i}}{\sum\limits_j e^{z_j}}$
With partial derivative: $\frac{\partial y_i}{\partial z_i} = y_i (1 - y_i)$
And a cross entropy function: $C = -\sum\limits_j t_j \log y_j$
Solve for $\frac{\partial C}{\partial z_i}$
My work so far:
\begin{align*} \frac{\partial C}{\partial y_j} &= -\frac{t_j}{y_j} \\ y_j &= e^{z_j - z_i} \cdot y_i \\ \frac{\partial y_j}{\partial z_i} &= e^{z_j - z_i} y_i(1-y_i) - e^{z_j - z_i}y_i \\ &= -y_i^2 e^{z_j - z_i} \\ \frac{\partial C}{\partial z_i} &= \sum\limits_j \frac{\partial C}{\partial y_j} \frac{\partial y_j}{\partial z_i} \\ &= \sum\limits_j \frac{t_j y_i^2 e^{z_j-z_i}}{y_j} \\ &= \sum\limits_j t_j y_i \\ &= y_i \sum\limits_j t_j \\ \end{align*}
The lecture slides give an answer of $y_i - t_i$. Why do my derivation steps not result in the same answer?
Are the $z_j$ functions of each other? In what follows I assume they are not (i.e. $\frac{\partial z_j}{\partial z_i}=0$ for all $i,j$ and derive a different answer from the one you gave. What about the $y_j$? Are the $t_i$ or the other $y_i$ functions of $y_j$?
If not, we have $\frac{\partial C}{\partial y_j}=-\frac{t_j}{y_j}$.
Also $y_j=e^{z_j-z_i}y_i$, so
$$\frac{\partial y_j}{\partial z_i}=e^{z_j-z_i}y_i(1-y_i)-e^{z_j-z_i}y_i=-y_i^2e^{z_j-z_i}$$ by the product rule. Thus
$$\frac{\partial C}{\partial z_i} = \sum_j \frac{\partial C}{\partial y_j} \frac{\partial y_j}{\partial z_i}=\sum_j \frac{t_jy_i^2}{y_j}e^{z_j-z_i}=\frac{y_i^2}{e^{z_i}}\sum_j\frac{t_je^{z_j}}{y_j}$$