I'm trying to understand what Fisher Distance actually is:
As well I am unsure how the writer of the paper has gotten from equation 4 to 5 by substituting in the activation function as the integration variable.
I'm trying to understand what Fisher Distance actually is:
As well I am unsure how the writer of the paper has gotten from equation 4 to 5 by substituting in the activation function as the integration variable.
Copyright © 2021 JogjaFile Inc.
We have $a(x)=\beta ^Tx+\beta_0$, and $\nabla a = \beta $, but as $x$ is in terms of $t$, $a$ can be alos seen it terms of $t$, and $a(t)=\beta ^Tx(t)+\beta_0 $, and we have $\dot{a}(t)=\frac{da}{dt}= \beta^T \dot{x}(t)$.
Again to the integral $(4)$, we have
$$ d(\chi_A, \chi_B)=\Bigg| \int_0^1\sqrt{ \dot{x}(t)^T\beta \beta^T \dot{x}(t). P_{c=1}(1-P_{c=1}) }dt \Bigg|$$ But $$P_c(x)=\frac{ c+(1-c)e^{-a(x)}}{1+e^{-a(x)}}$$ So for $c=1$, $$ P=P_{c=1}=\frac{ 1}{1+e^{-a(x)}} $$ Note that $$ \dot{x}(t)^T\beta = \beta^T \dot{x}(t)=\dot{a}(t)$$ and both are scalar values not vector values. Thus the above integral can be written as
$$ d(\chi_A, \chi_B)=\Bigg| \int_0^1\sqrt{ \dot{a}(t)^2. P_{c=1}(1-P_{c=1}) }dt \Bigg|= \Bigg| \int_0^1\sqrt{ P_{c=1}(1-P_{c=1}) } \;\;\dot{a}(t) dt \Bigg|$$ Thus $$ d(\chi_A, \chi_B)= \Bigg| \int_0^1 \Big(\frac{ 1}{1+e^{-a(x)}}\Big[ 1-\frac{ 1}{1+e^{-a(x)}}\Big] \Big)^{1/2} \;\;\dot{a}(t) dt \Bigg|= \Bigg| \int_0^1 \Big[\frac{ e^{-a(x)} }{(1+e^{-a(x)})^2}\Big]^{1/2} \;\;\dot{a}(t) dt \Bigg| $$ and so $$d(\chi_A, \chi_B)= \Bigg| \int_0^1 \frac{ e^{\frac{-a(x)}{2}} }{1+e^{-a(x)}} \;\;\dot{a}(t) dt \Bigg| $$ Taking the change of variable $u=a(t)$, then $du=\dot{a}(t)dt$, we get
$$d(\chi_A, \chi_B)= \Bigg| \int_{a(0)}^{a(1)} \frac{ e^{\frac{-u}{2}} }{1+e^{-u}} du \Bigg| $$ This last integral is equal to $(5)$, noting that $$\int \frac{du}{1+u^2}=arctg(u)$$ and $ e^{-u}=(e^{\frac{-u}{2}})^2 $.
Hope this helps!