I am trying to solve the math for back propagation algorithm using cross-entropy cost/loss function and softmax activation function in output layer.
The network architecture for 3-class classification is as follows:
The input layer has two input neurons (i1 and i2)
There is a single hidden layer having 3 neurons (h1, h2 and h3)
The output layer has 3 output neurons (o1, o2 and o3). These output scores or logits are fed into the softmax function which then outputs the probabilities (S1, S2 and S3).
I am attaching a hand sketched image for a visual representation of the neural network.
w1 is the weight from hidden neuron h1 to output neuron o1.
And, w13 is the weight from input neuron i1 to hidden neuron h1.
The mathematical partial derivative equations which I have come up with are:
$\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial S_1} * \frac{\partial S_1}{\partial out_{o1}} * \frac{\partial out_{o1}}{\partial net_{o1}} * \frac{\partial net_{o1}}{\partial w_1}$
OR:
$\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial out_{01}} * \frac{\partial out_{o1}}{\partial net_{o1}} * \frac{\partial net_{o1}}{\partial w_1}$
Where, $S_1$ refers to softmax output for 1st class, $out_{o1}$ refers to output of first output neuron and $net_{01}$ refers to network input for output neuron $o1$.
Is my mathematical partial derivative equations correct?
Also, I wish to compute the partial derivative of:
$\frac{\partial L}{\partial w_{13}}$
What should the equations look like?
Thanks!
