If Cost function is L ,
$$ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $$
$$ h(x)=\frac{1}{1+e^{-(w^{T}x+b)}} $$
First order partial deriavative of L with respect to w is ,
$$\frac{\partial L}{\partial w} = - ( \frac{1}{m} ) ( h(w) - y )x $$
Question :
how do i find the second order partial derivative of L with respect to w ?, that is $$ \frac{\partial ^{2}L}{\partial w^{2}}$$
So that i can compute the error gradient by using Newton's method and update Weights $ w $, like this
$$ w_{new} = w_{old} - (\frac{\partial ^{2}L}{\partial w^{2}})^{-1} \ ( \frac{\partial L}{\partial w}) $$
Am just trying to figure out how Newton's method works with logistic regression.
Hessian of Loss function ( Applying Newton's method in Logistic Regression )
3.1k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
It is not so clear that you get these concepts.
You should clarify inputs and outputs. What you seem to have done is calculated second derivative of a scalar valued function of one variable. In other words : $$\mathbb R^{1} \to \mathbb R^{1}$$ function. Jacobians take all different partial differentials with respect to all different input variables. For a function $$\cases{x \in \mathbb R^n\\f(x) \in \mathbb R^m}$$
you get an output that is a $n\times m$ matrix.
For a Hessian to be a matrix we would need for a function $f(x)$ to be $$\mathbb R^{n} \to \mathbb R^{1}$$
the more general case
$$\mathbb R^{n} \to \mathbb R^{m}$$
it will be a 3 indexed tensor.
On
For the second derivative, you could do is faster $$\sigma' (x)= \sigma(x)(1-\sigma(x))=\sigma(x)-\sigma^2(x)$$ $$\sigma'' (x)=\sigma' (x)-2 \sigma (x)\sigma' (x)=\sigma' (x)(1-2\sigma (x))=\sigma(x)(1-\sigma(x))(1-2\sigma (x))$$ which is not what you obtain.
Edit
In order to check the result, let us use the second-order central derivative $$f''(x) = \frac{f(x+h) - 2 f(x) + f(x-h)}{h^{2}}$$ at $x=\frac 12$ and $h=\frac 1 {200}$.
This would give $-0.0575566$ while the formula I wrote gives $-0.0575568$; your formula leads to $0.292561$.
At least, we do not agree.
First of all $f(x)$ has to satisfy the condition where its hessian has to be $$\mathbb R^{n} \to \mathbb R^{1}$$ Meaning that $f(x)$ has to be twice differentiable and it is positive semi-definite.
we already know from here that ,
$$ \frac{\partial L} { \partial w} = (h_\theta(x) - y)x $$ $$ \sigma(x) = \frac{1}{1+e^{-(w^Tx+b)}}$$ Refer here for proof on first deriavative of $ \sigma(x)$ , $$ \sigma^{'}(x) = \sigma(x)(1-\sigma(x)) $$
Now we need to find $ \frac{\partial^2 L}{ \partial w^2} $ , $$ \begin{align*} \frac{\partial^2 L}{ \partial w^2} &= \frac{\partial L}{\partial w}(xh_\theta(x) - xy) \\ \\ &= x^2 \frac{\partial L}{\partial w} (h_\theta(x)) \ \ \ \ \ \ \ \ \ [ \ h_\theta^{'}(x) = \sigma^{'}(x) \ ] \\ \\ &= x^2 ( \sigma(x) - \sigma(x)^2) \\ \\ & (or) \\ \\ &= x .( \sigma(x) - \sigma(x)^2).x^{T} \end{align*} $$ Am hoping that am correct !