How to show that Fischer Matrix can be obtained from Hessian of Logistic Loss Function

63 Views Asked by At

I am solving a progressive question where I need to prove several things.

Given,

$$ f(\theta) = \frac{1}{m}\sum_{i=1}^m\log(1 + exp(-y_ix_i^T\theta))\text{ , }\sigma(s) = \frac{1}{1 + exp(-s)} $$

Here is what I have already proved: -

  1. $$ \nabla f(\theta) = -\frac{1}{m}\sum_{i=1}^my_i(1-\sigma(y_ix_i^T\theta)).x_{i} $$
  2. $$ \nabla^2 f(\theta) = \frac{1}{m}\sum_{i=1}^my_i^2x_ix_i^T\sigma(y_ix_i^T\theta)(1-\sigma(y_ix_i^T\theta)) $$

Now, for the third part, It is given that the Fisher Information Matrix is: $$ I(\theta) = E_{x,y}\bigg[\nabla_\theta\log(p_\theta(x,y))(\nabla_\theta\log(p_\theta(x,y)))^T \big|\theta\bigg] $$

I must prove that this Fisher Information Matrix is equivalent to the Hessian of $ f $ (in Logistic Regression) if $ m \rightarrow +\infty $ and $ x_i, y_i $ were drawn i.i.d. from the distribution.

Here is my approach: -

$$ I(\theta) = E_{x,y}[yx(1-\sigma(yx^T\theta)).\big(yx(1-\sigma(yx^T\theta))\big)^T \big|\theta] $$ $$ = E_{x,y}[yx(1-\sigma(yx^T\theta)).(1-\sigma(yx^T\theta))^Tyx^T \big|\theta] $$ $$ = E_{x,y}[y^2xx^T(1-\sigma(yx^T\theta)).(1-\sigma(yx^T\theta))^T \big|\theta] $$

It looks almost similar. How do I convert $ (1-\sigma(yx^T\theta))^T $ into $ \sigma(y_ix_i^T\theta) $?

We are given a hint that $ 1 - \sigma(s) = \sigma(-s) $, but I don't understand where to use it in this derivation...