Optimization of Softmax Regression

66 Views Asked by Bumbble Comm At 12 Apr 2026 - 7:32

I'm trying to learn the mathematics behind softmax regression. Optimization of softmax regression is generally discussed in the context of deep learning but I'm looking for an explanation in the context of multinomial logistic regression. In logistic regression we have following update rule,

$$\theta _j:=\theta _j-\:\alpha \:\:\left(\frac{1}{m}\:\sum _{i=1}^m\left(h_{\theta }\left(x^{\left(i\right)}\right)-y^{\left(i\right)}\right)^2\cdot x_j^{\left(i\right)}\right)\:$$

and the gradient vector for cross entropy can be calculated with the equation below

$$∇_{θ^k}J\:\left(Θ\right)=\frac{1}{m}\sum _{i=1}^m\:\left(\hat p_k^i-y_k^i\right)x^i$$

In that case, can I just put this gradient equation into the update rule as stated below ?

$$\theta \:_j:=\theta \:_j-\:\alpha \left(\frac{1}{m}\sum _{i=1}^m\:\left(\hat p_k^i-y_k^i\right)x^i\right)\:$$

Original Q&A

Optimization of Softmax Regression

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions