Let's say that I want to find the stationary points of the Cross-Entropy Loss function when using a logistic regression
The 1 D logistc function is given by : \begin{equation}\label{eq2} \begin{split} \sigma(wx) = \frac{1}{1+\exp{(-wx)}} \end{split} \end{equation}
and the cross entropy loss is given by :
\begin{equation}\label{eq3} \begin{split} \textbf{L}(wx) = -y \log{(\sigma(wx))} - (1-y) \log{(1-\sigma(wx))} \end{split} \end{equation}
When I simplify and differentiate and equal to 0, I find the following:
\begin{equation}\label{eq7} \begin{split} \frac{d\textbf{L}}{dw} &= (1-y)x - \frac{xe^{-wx} }{1+e^{-wx}} = 0\\ (x-xy)*(1+e^{-wx}) &=xe^{-wx} \\ (1-y)(1+e^{-wx}) &= e^{-wx}\\ 1 +e^{-wx} -y- ye^{-wx} &= e^{-wx}\\ 1-y - ye^{-wx}& = 0\\ 1-y & = ye^{-wx}\\ \frac{(1-y)}{y} &= e^{-wx}\\ w &= - \frac{\log(\frac{(1-y)}{y})}{x} \end{split} \end{equation}
However, this is very weird and strongly feels very wrong:
- First x cannot be equal to 0
- Second, y cannot be equal to 0
- Third, y cannot be equal to 1
1 and 0 are the only values that y takes in a cross-entropy loss, based on my knowledge. I am not sure where I left the right track.
I know that cross-entropy loss means there are 2 loss (one for each value of y) but I am not sure if that plays in the steps and if yes, how? Could you please help me?
Thanks in advance
$ \def\l{\lambda}\def\s{\sigma} \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $The derivative of the logistic function is well-known $$\eqalign{ z &= wx \\ \s(z) &= \LR{\o+e^{-z}}^{-1} \\ d\s &= \LR{\s-\s^2}dz \;=\; \LR{\s-\s^2}x\,dw \\ }$$ Calculate the differential of the loss function wrt $\s$ then change the independent variable to $w$ before recovering the gradient. $$\eqalign{ \l &= -(\o-y):\log(\o-\s) - y:\log(\s) \\ d\l &= -(\o-y):d\log(\o-\s) - y:d\log(\s) \\ &= -(\o-y):\fracLR{d\LR{\o-\s}}{\o-\s} - y:\fracLR{d\s}{\s} \\ &= \LR{\frac{\o-y}{\o-\s} - \frac{y}{\s}}:d\s \\ &= \fracLR{\s-y}{\s-\s^2}:\BR{\LR{\s-\s^2}x\;dw} \\ &= \BR{\LR{\s-y}x}:dw \\ \grad{\l}{w} &= {\LR{\s-y}x} \\ }$$ This gradient is equal to zero if $\;x=0\;$ or if $$y=\s=\LR{\o+e^{-wx}}^{-1} \qiq xw = \log\fracLR{y}{\o-y}$$