softmax cross entropy derivative

661 Views Asked by At

I am working with Logistics Regression with multiclass classfication(softmax with entropy): $$L(w)=-\sum_{n}\sum_{k}y_{nk}log(\frac{e^{w_{k}^Tx_{n}}}{\sum_{i}e^{w_{i}^Tx_{n}}})$$ differentiation: $$\frac{\partial L(w)}{\partial w_{j}}$$ $$-\sum_{n}\sum_{k}y_{nk}\frac{1}{p_{k}}\quad and\quad p_{k}=\frac{e^{w_{k}^Tx_{n}}}{\sum_{i}e^{w_{i}^Tx_{n}}}$$ when k = j $$\frac{e^{w_{j}^Tx_{n}}\cdot \sum_{i}e^{w_{i}^Tx_{n}}-e^{w_{j}^Tx_{n}}\cdot e^{w_{k}^Tx_{n}}}{(\sum_{i}e^{w_{i}^Tx_{n}})^2}=p_{j}(1-p_{k})$$ $$ =-\sum_{n}y_{nj}\frac{1}{p_{j}}(p_{j}(1-p_{k}))x_{n} $$ when k ${\neq}$ j

$$\frac{0\cdot \sum_{i}e^{w_{i}^Tx_{n}}-e^{w_{j}^Tx_{n}}\cdot e^{w_{k}^Tx_{n}}}{(\sum_{i}e^{w_{i}^Tx_{n}})^2}=-p_{j}p_{k}$$ $$=-\sum_{n}\sum_{k\neq j}y_{nk}\frac{1}{p_{k}}\cdot -(p_{j}p_{k})x_{n}$$
Question 1 : Please explain how $\sum_{i}e^{w_{i}^Tx_{n}}$ works when under differentiation for case i=j and i${\neq}$j?

after that combine everythings: $$=-\sum_{n}y_{nj}\frac{1}{p_{j}}(p_{j}(1-p_{k}))x_{n}+\sum_{n}\sum_{k\neq j}y_{nk}\frac{1}{p_{k}}(p_{j}p_{k})x_{n}$$ Question 2 : How to further simplification?
The result should be : $$\sum_{n}(p_{j}-y_{nj})x_{n}$$ Of course I am not sure the index is right or not
Can someone help me please?