Multi-Class Logistic Regression

72 Views Asked by At

Considering:

$z=Wx$

$y=softmax(z)$

$L_{CE}(t,y)=-t^{-T}\log y = \sum_{k=1}^{K} t_k \log y_k$

Where the output $t$ is a one-hot encoding of the output.

I am trying to figure out a way to compute $\frac{\partial y_k}{\partial z_{k^\prime}}$ for any $k,k^\prime = 1,...,K$ where $k$ and $k^\prime$ may or may not be the same.

I know that $k$-th component of $y$ is: $y_k = softmax(z_1,...,z_K)_k = \frac{exp(z_k)}{\sum_{k^{\prime}=1}^{K} exp(z_{k^\prime})}$ but I am not quite sure how to proceed.

Additionally, what would be $\frac{\partial L_{CE}(t,y(x;W))}{\partial w_k}$?

1

There are 1 best solutions below

1
On BEST ANSWER

If I understand your problem correctly, we have two cases:

If $k=k'$, then \begin{equation*} \frac{\partial y_k}{\partial z_{k'}} = \frac{\partial}{\partial z_{k'}} \frac{e^{z_{k'}}}{\sum_{j=1}^K e^{z_j}} = \frac{e^{z_{k'}}}{\sum_{j=1}^K e^{z_j}} - \frac{e^{z_{k'}}}{\left(\sum_{j=1}^K e^{z_j}\right)^2}e^{z_{k'}} = y_k - y_k^2. \end{equation*} If $k\ne k'$, then \begin{equation*} \frac{\partial y_k}{\partial z_{k'}} = \frac{\partial}{\partial z_{k'}} \frac{e^{z_{k}}}{\sum_{j=1}^K e^{z_j}} = -\frac{e^{z_k}}{\left( \sum_{j=1}^K e^{z_j} \right)^2}e^{z_{k'}} = -y_k y_{k'}. \end{equation*}