What's wrong with these derivatives?

86 Views Asked by At

I was testing my analytical solution to the following term w.r.t the parameter $k$ and $\mu$.

$$ \mathcal{L} = \color{green}{- \eta_n\log\left(1-P(t_n)\right)} $$

$P(t_n)$ is the cdf of the Inverse Gaussian distribution

$$ P(t) =Pr(X< t)= \Phi\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)+e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) $$ Where $\Phi$ is the standard normal distribution cdf.

However, by comparing my solution with an approximated gradient given by scipy.optimize.approx_fprime I realised that for the same point $[k_0,\mu_0]$ the derivatives are wrong, specifically for $\mu$ it seems that I forgot a $\times 2$ factor, while for $k$ both the sign and the value are different. (I give more details at the end of the question)

Here are my notes:

(Note that I computed everything in C with mpfr, so there shouldn't be any numerical error.

$$ \frac{\partial}{\partial P(t_n)} \left(-\eta_n\log\left(1-P(t_n)\right)\right) = \\ \frac{\partial \left(-\eta_n\log\left(1-P(t_n)\right)\right)}{\partial \left(1-P(t_n)\right)} \cdot \frac{\partial \left(1-P(t_n)\right)}{\partial P(t_n)} =\\ -\frac{\eta_n}{1-P(t_n)}\cdot(-1)=\frac{\eta_n}{1-P(t_n)} $$

$k$

I rewrite here the definition of $P(t)$ to make the notes more readable:

$$ P(t) =Pr(X< t)= \Phi\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)+e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) $$

$$ \frac{\partial}{\partial k}P(t)=\\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\frac{\partial}{\partial k}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)} + \\ \color{green}{\frac{\partial}{\partial k}\left(\frac{2k}{\mu}\right)}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) + \\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{\frac{\partial}{\partial k}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu}} \\= \\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\left(\frac{1}{2t\sqrt{\frac{k}{t}}}\left(\frac{t}{\mu}-1\right)\right)} +\\ \color{green}{\frac{2}{\mu}}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) + \\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{ \left(-\frac{1}{2t \sqrt{\frac{k}{t}}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu}} $$

Which can be rewritten as $$ \frac{\partial}{\partial k}P(t)= \\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\left(\frac{1}{2t\sqrt{\frac{k}{t}}}\left(\frac{t}{\mu}-1\right)\right)} +\\ \color{green}{\frac{2}{\mu}}e^{\frac{2k}{\mu}+\log\left(\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)\right)} + \\ \color{purple}{ \left(-\frac{1}{2t \sqrt{\frac{k}{t}}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu} + \log\left( \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)\right)} $$

We prefer this formulation since computing exp(2k/mu) alone would be impracticable since the values for $k$ may be greater than $10^3$, moreover there exists valid approximations or exact formulations for the logcdf and logpdf of the normal distribution.

$$ \frac{\partial}{\partial \mu}P(t)=\\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\frac{\partial}{\partial \mu}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)} + \\ \color{green}{\frac{\partial}{\partial \mu}\left(\frac{2k}{\mu}\right)}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) + \\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{\frac{\partial}{\partial \mu}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu}} \\ = \\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\left(-\sqrt{\frac{k}{t}}\cdot \frac{t}{\mu^2}\right)} +\\ \color{green}{\left(-\frac{2k}{\mu^2}\right)}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) +\\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{\left(\sqrt{\frac{k}{t}}\cdot\frac{t}{\mu^2}\right)} e^{\frac{2k}{\mu}} $$

Which can be rewritten as

$$ \frac{\partial}{\partial \mu}P(t)= \\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\left(-\sqrt{\frac{k}{t}}\cdot \frac{t}{\mu^2}\right)} +\\ \color{green}{\left(-\frac{2k}{\mu^2}\right)}e^{\frac{2k}{\mu}+\log\left(\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)\right)} +\\ \color{purple}{\left(\sqrt{\frac{k}{t}}\frac{t}{\mu^2}\right)} e^{\frac{2k}{\mu}+\log\left(\Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)\right)} $$

Then, we can add to the gradient the following terms

$$ \frac{\partial}{\partial k} \left(-\eta_n\log\left(1-P(t_n)\right)\right) = \frac{\eta_n}{1-P(t_n)}\cdot\frac{\partial P(t_n)}{\partial k}\\ \frac{\partial}{\partial \mu} \left(-\eta_n\log\left(1-P(t_n)\right)\right) = \frac{\eta_n}{1-P(t_n)}\cdot\frac{\partial P(t_n)}{\partial \mu} $$

Actually in order to estimate $\mu$ I use an AR model of order $p$ (the AR coefficients are represented by the $\theta$ vector), so I have

$$ \mu=\theta_0\cdot1+\sum_{j=1}^{p}\theta_j\cdot x_j $$

And, by defining, $x_t = [1,x_1,...,x_p]$, the derivatives w.r.t. $theta$ are:

$$ \frac{\partial}{\partial \theta_j} \left(-\eta_n\log\left(1-P(t_n)\right)\right) = \frac{\eta_n}{1-P(t_n)}\cdot\frac{\partial P(t_n)}{\partial \mu}\cdot\frac{\partial \mu}{\partial \theta_j}\\ \frac{\eta_n}{1-P(t_n)}\cdot\frac{\partial P(t_n)}{\partial \mu}\cdot x_j $$

with $x_0 = 1$.

For the starting point

k = 1500.0
theta = [0.85, 0.4, -0.45, 0.1]

Given xt = [1., 0.93, 0.953, 0.953] we have:

(Pdb) mpfr_gradient
array([ 2.20329987e-03, -7.46972244e+00, -6.94684187e+00, -7.11864549e+00,
       -7.11864549e+00])
(Pdb) approximated_gradient
array([-6.56127930e-05, -3.66268166e+00, -3.40628900e+00, -3.49053237e+00,
       -3.49052827e+00])

The approximated gradient was computed with scipy.optimize.approx_fprime with a step size of 1e-10

1

There are 1 best solutions below

0
On BEST ANSWER

Ok I solved it, the derivatives are right. My approximation for the logcdf of the normal distribution wasn't just accurate enough.