Optimization of Softmax Regression

66 Views Asked by At

I'm trying to learn the mathematics behind softmax regression. Optimization of softmax regression is generally discussed in the context of deep learning but I'm looking for an explanation in the context of multinomial logistic regression. In logistic regression we have following update rule,

$$\theta _j:=\theta _j-\:\alpha \:\:\left(\frac{1}{m}\:\sum _{i=1}^m\left(h_{\theta }\left(x^{\left(i\right)}\right)-y^{\left(i\right)}\right)^2\cdot x_j^{\left(i\right)}\right)\:$$

and the gradient vector for cross entropy can be calculated with the equation below

$$∇_{θ^k}J\:\left(Θ\right)=\frac{1}{m}\sum _{i=1}^m\:\left(\hat p_k^i-y_k^i\right)x^i$$

In that case, can I just put this gradient equation into the update rule as stated below ?

$$\theta \:_j:=\theta \:_j-\:\alpha \left(\frac{1}{m}\sum _{i=1}^m\:\left(\hat p_k^i-y_k^i\right)x^i\right)\:$$