Considering:
$z=Wx$
$y=softmax(z)$
$L_{CE}(t,y)=-t^{-T}\log y = \sum_{k=1}^{K} t_k \log y_k$
Where the output $t$ is a one-hot encoding of the output.
I am trying to figure out a way to compute $\frac{\partial y_k}{\partial z_{k^\prime}}$ for any $k,k^\prime = 1,...,K$ where $k$ and $k^\prime$ may or may not be the same.
I know that $k$-th component of $y$ is: $y_k = softmax(z_1,...,z_K)_k = \frac{exp(z_k)}{\sum_{k^{\prime}=1}^{K} exp(z_{k^\prime})}$ but I am not quite sure how to proceed.
Additionally, what would be $\frac{\partial L_{CE}(t,y(x;W))}{\partial w_k}$?
If I understand your problem correctly, we have two cases:
If $k=k'$, then \begin{equation*} \frac{\partial y_k}{\partial z_{k'}} = \frac{\partial}{\partial z_{k'}} \frac{e^{z_{k'}}}{\sum_{j=1}^K e^{z_j}} = \frac{e^{z_{k'}}}{\sum_{j=1}^K e^{z_j}} - \frac{e^{z_{k'}}}{\left(\sum_{j=1}^K e^{z_j}\right)^2}e^{z_{k'}} = y_k - y_k^2. \end{equation*} If $k\ne k'$, then \begin{equation*} \frac{\partial y_k}{\partial z_{k'}} = \frac{\partial}{\partial z_{k'}} \frac{e^{z_{k}}}{\sum_{j=1}^K e^{z_j}} = -\frac{e^{z_k}}{\left( \sum_{j=1}^K e^{z_j} \right)^2}e^{z_{k'}} = -y_k y_{k'}. \end{equation*}