Solving Normal Probability Distribution (PDF) and Cumulative Probability Distribution (CDF) for given X and comparing to Excel's NORM.DIST() function

375 Views Asked by At

I have been trying to understand and implement the Excel function NORM.DIST(x, mean, standard_deviation, cumulative) in another programming language. There exist libraries in R (“stats”) and Python ("scipy.stats") that are not handy for me.

There is a documentation for that Excel function here.

Based on that documentation and other sources I have seen that the equation for the normal probability/density function is:

$$f(x, \mu, \sigma) = \frac{1}{\sqrt{(2\pi)}\sigma}e^{-(\frac{(x-\mu)^{2}}{2\sigma^{2}})}$$

I am solving for given $x=1, \mu=0, \sigma=1$

In Excel is NORM.DIST(1,0,1,FALSE) which gives a result of $0.241970725$

By using an online calculator Symbolab I am solving this equation and it gives me the same result check here.

Now when I try to solve for Cumulative Probability/Density Distribution, based on the Excel documentation I have to solve for the integral from negative infinity to x of the given formula, that takes form of the equation:

$$f(x, \mu, \sigma) = \int_{-\infty}^{x}\frac{1}{\sqrt{(2\pi)}\sigma}e^{-(\frac{(x-\mu)^{2}}{2\sigma^{2}})}dt$$

Using Symbolab when I try to solve this function diverges (check here), but in Excel NORM.DIST(1,0,1,TRUE) it gives a result of $0.841344746$

Can someone try to solve this integral, if I am doing something wrong and if there is any step that I am missing trying to solve the cumulative density function (CDF).

1

There are 1 best solutions below

2
On

They are different!

In fact, $f(x_0)$ (PDF) returns the weight of point $x_0$ in comparison to other points and has no direct meaning unless you integrate it on some interval containing it: $$P(x_0-\epsilon \leqslant x \leqslant x_0+\epsilon) = \int_{x_0-\epsilon}^{x_0+\epsilon} f(x) dx \approx 2\epsilon f(x_0)$$ That means probability that $X$ occurs near $x_0$ is like a linear function on length of interval by derivative equals to $f(x_0)$.

But $F(x_0)$ (CDF) has self meaning of probability: $$P(x \leqslant x_0) = F(x_0)$$ That means probability of $X$ occurs before $x_0$.

Now for your comment and reason of diversity. When you integrate $f$ to get $F$, you should change variable of $f(x)$ to $f(t)$ and at end it replaced by $x$ in upper boundary to reach $F(x)$: $$F(x) = \int_{-\infty}^x f(t) dt \overset{\text{For Normal Dist.}}{=======} \int_{-\infty}^x \frac{1}{\sqrt{2\pi}.\sigma}exp(-\frac{({\color{red}t}-\mu)^2}{2\sigma^2}) dt$$ That you wrongly write $x$ instead of ${\color{red}t}$ And then replaced it by $1$ before getting integration. In fact, you integrate of a constant in infinite domain that diverges obviously.