Multiclass logistic regression gradient descent update rule for weights using softmax/categorical cross entropy?

242 Views Asked by At

I'm trying to find the general update rule (for gradient descent) for multiclass logistic regression for all weights.

Say the logistic regression model has 3072 inputs, and 10 classes. That would make our weights matrix 10x3072. Since we're using softmax, I already calculated the derivative of the cross entropy w.r.t the inputs into the softmax (the logits vector outputted by the logistic regression model)

Derivative of cross entropy w.r.t inputs into softmax (z): derivative of cross entropy w.r.t inputs into softmax

My question is that are we allowed to use a chain-rule type of logic to find the general update rule for thetas w.r.t the cost function? Here's an image to describe this:

chain rule proposal

Is there any way we could derive an update rule by using "backpropagation" (I know it's just logistic regession, but the logic is similar)?

If not, how does the algorithm update during gradient descent?

Thanks so much to anyone who helps!

A