I am teaching myself Artificial Intelligence from scratch without libraries
I have a decent handle on most of it
UPDATE-EDIT
I am lost however on the next step mathematically after deriving the softmax activation function
as an example to hopefully clarify
lets call Softmax Derivative dSM and if that is the name of the function and the index of the value outputted is i
then it would be dSM_i
when the index i is equal to k which i will define as the ground truth vector index
then
the matrix would look like
(dSM_i * (1 - dSM_i)) (-dSM_i * dSM_k) (-dSM_i * dSM_k)
(-dSM_i * dSM_k) (dSM_i * (1 - dSM_i)) (-dSM_i * dSM_k)
(-dSM_i * dSM_k) (-dSM_i * dSM_k) (dSM_i * (1 - dSM_i))
but I dont know what to do from there
how do i go from there to the equation
derivative Of sum of loss w.r.t derivative of activation
multiplied by
derivative of activation w.r.t derivative of input
multiplied by
derivative of input w.r.t derivative of weight
each row of the jacobian matrix has 3 values when all I need has is 1
Please someone help Thanks I cant find anything yet just how to get to the place i can get to already