How to avoid overflows and numerical issues when computing derivatives?

56 Views Asked by At

I'm trying to compute the derivative of this quantity w.r.t to $\mu$ and $k$:

$$ \mathcal{L}(t) = - \eta_n\log\left(1-P(t)\right) $$

Where $\eta_n$ is just a weighting factor and P(t) is the cdf of the Inverse Gaussian distribution:

$$ P(t) = \Phi\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)+e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) $$

$\Phi$ is the cdf of the Standard Normal distribution.

$$ \Phi(x) = \frac{1}{2}\left(1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right)\\ \Phi'(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2} $$

Starting from:

$$ \frac{\partial}{\partial k} \left(-\eta_n\log\left(1-P(t_n)\right)\right) = \frac{\partial \left(-\eta_n\log\left(1-P(t_n)\right)\right)}{\partial \left(1-P(t_n)\right)} \cdot \frac{\partial \left(1-P(t_n)\right)}{\partial P(t_n)} \cdot \frac{\partial P(t_n)}{\partial k} = \frac{\eta_n}{1-P(t_n)}\frac{\partial P(t_n)}{\partial k} $$

$$ \frac{\partial}{\partial \mu} \left(-\eta_n\log\left(1-P(t_n)\right)\right) = \frac{\partial \left(-\eta_n\log\left(1-P(t_n)\right)\right)}{\partial \left(1-P(t_n)\right)} \cdot \frac{\partial \left(1-P(t_n)\right)}{\partial P(t_n)} \cdot \frac{\partial P(t_n)}{\partial \mu} = \frac{\eta_n}{1-P(t_n)}\frac{\partial P(t_n)}{\partial \mu} $$

And given that I have to work with values of $k\sim1500$ and $0.5<\mu<1.5$ and the derivatives w.r.t. $\mu$ and $k$ I obtained are:

$$ \frac{\partial}{\partial k}P(t)=\\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\frac{\partial}{\partial k}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)} + \\ \frac{2}{\mu}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) + \\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{\frac{\partial}{\partial k}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu}} \\= \\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\left(\frac{1}{2t\sqrt{\frac{k}{t}}}\left(\frac{t}{\mu}-1\right)\right)} +\\ \frac{2}{\mu}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) + \\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{ \left(-\frac{1}{2t \sqrt{\frac{k}{t}}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu}} $$ and $$ \frac{\partial}{\partial \mu}P(t)=\\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\frac{\partial}{\partial \mu}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right)} +\\ \frac{2}{\mu}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) +\\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{\frac{\partial}{\partial \mu}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)} e^{\frac{2k}{\mu}} \\ = \\ \Phi^{'}\left(\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}-1\right)\right) \color{blue}{\left(-\sqrt{\frac{k}{t}}\cdot \frac{t}{\mu^2}\right)} +\\ \frac{2}{\mu}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) +\\ \Phi^{'}\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) \color{purple}{\left(\sqrt{\frac{k}{t}}\frac{t}{\mu^2}\right)} e^{\frac{2k}{\mu}} $$

How is it possible to actually compute this values (e.g. by coding in in Python) ?

e.g. when I have to compute $\frac{2}{\mu}e^{\frac{2k}{\mu}}\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right) $ I encountner overflow problems since I'm exponentiating $e^{\sim 3000}$, but at the same time $\Phi\left(-\sqrt{\frac{k}{t}}\left(\frac{t}{\mu}+1\right)\right)\sim 0$.

How can I solve this kind of problems? I should also compute the second order derivatives and I'm afraid that this is not the right path since I'd encounter a ton of numerical issues.

Would it be a good idea to approximate the derivatives by using SymPy?

I'll leave some plots to better illustrate the problem (Note that I bound the maximum exponent of $e$ to $700$ just for visual purposes) :

from scipy.stats import norm

_MAX_EXPONENT = 700
def _exp(n):
    return np.exp(min(n,_MAX_EXPONENT))

def igcdf(t: np.array, mu: float, lamb:float):
    return norm.cdf(np.sqrt(lamb/t) * (t/mu -1)) + _exp(2 * lamb / mu) * norm.cdf(-np.sqrt(lamb / t) * (t / mu + 1))

def igpdf(t,mu,lamb):
        arg = lamb / (2 * np.pi * t ** 3)
        return np.sqrt(arg) * np.exp((-lamb * (t - mu) ** 2) / (2 * mu ** 2 * t))
    
def dk(t: np.array,mu: float, k: float):
    etan = 1
    tmp1 = etan/(1 - igcdf(t,mu,k))
    tmp2 = norm.pdf(np.sqrt(k/t)*(t/mu-1)) * (1/(2*t*np.sqrt(k/t))*(t/mu-1))
    tmp3 = (2/mu)*_exp(2*k/mu)*norm.cdf(-np.sqrt(k/t)*(t/mu + 1))
    tmp4 = norm.pdf(-np.sqrt(k/t)*(t/mu+1)) * (-1/(2*t*np.sqrt(k/t))*(t/mu+1)) * _exp(2*k/mu)
    return tmp1 * (tmp2 + tmp3 + tmp4)

def dmu(t: np.array,mu: float, k: float):
    etan = 1
    tmp1 = etan/(1 - igcdf(t,mu,k))
    tmp2 = norm.pdf(np.sqrt(k/t)*(t/mu-1)) * (-np.sqrt(k/t)*t/(mu**2))
    tmp3 = (2/mu)*_exp(2*k/mu)*norm.cdf(-np.sqrt(k/t)*(t/mu + 1))
    tmp4 = norm.pdf(-np.sqrt(k/t)*(t/mu+1)) * (np.sqrt(k/t)*t/(mu**2)) * _exp(2*k/mu)
    
    return tmp1 * (tmp2 + tmp3 + tmp4)


def negloglik(t: np.array, mu: float, k: float):
    etan = 1
    return -etan*np.log(1e-8 + 1 - igcdf(t,mu,k))

enter image description here Thanks

enter image description here

enter image description here