Derivative of cross entropy proof

75 Views Asked by At

Given the following definition of the loss function $L_{CE}$:


enter image description here


Here's my attempt for gradient w.r.t $W$, which is wrong. Where did I make the mistake?

enter image description here