When determining a CDF from a continuous PDF, why are the limits of the integral written from negative infinity to x?

6.2k Views Asked by At

Let $f_{X}(x)$ be a continuous density function, $F_{X}(x)$ be a cumulative density function, and $R_{X}$ be the range of values that the random variable $X$ can take on.

In all texts that I have come across, the general equation for determining the CDF from the PDF as follows:

$$F_{X}(x) = \int_{-\infty}^{x} f_{X}(x) \,dx$$

When a PDF is such that its range is $\mathbb{R}$, I understand the above equation makes sense. However, there are many distributions whose range is only some subset of the real numbers, such as exponential and uniform distributions. If we let $X\sim Exponential(\lambda)$, then keeping the lower limit of the integral at $-\infty$ would result in an incorrect CDF. Instead, you would need to rewrite the lower limit of the integral as $0$.

$$1-e^{-\lambda x} \neq \int_{-\infty}^{x} \lambda e^{-\lambda x} \,dx$$

My question is that why is the first equation not commonly written as the following? This would hold for all distributions, including those where the rv only takes on a subset of real values. (This is my own made up notation, and I am a lowly student of mathematics so forgive me if it poorly written).

$$F_{X}(x) = \int_{min(R_{X})}^{x} f_{X}(x) \,dx$$

And for that matter, would it not make more sense to write the following instead of the more traditional notation?

$$\int_{min(R_{X})}^{max(R_{X})} f_{X}(x) \,dx = 1 \text{ instead of writing } \int_{-\infty}^{\infty} f_{X}(x) \,dx = 1$$

2

There are 2 best solutions below

0
On BEST ANSWER

The definition for the cummulative distribution function $F(x)$ in respect to any continuous random variable with probability density function $f(x)$ is indeed:

$$F(x)=\int_{-\infty}^x f(s)\operatorname d s\qquad = \mathsf P(X\leqslant x)$$

However, as others have stated, the missing information concerns the support for the probability density function.   A pdf is often a piecewise function, with an interval over which it is greater than zero (the support) and equalling zero elsewhere.

Specifically, for an exponential distributed random variable with rate parameter $\lambda$, the pdf is actually: $f(s)=\lambda \exp(-\lambda s)~\big[s\in(0;\infty)\big]$ or... $$f(s) =\begin{cases} \lambda \mathsf e^{-\lambda s} &:& 0< s \\ 0 & :& s\leqslant 0\end{cases}$$

Hence the CDF is also a piecewise function: $$\begin{align}F(x) ~&=~ \int_{-\infty}^x f(s) \operatorname d s \\[2ex] &=~ \begin{cases} \int_{-\infty}^0 0 \operatorname d s+ \int_0^x \lambda\exp(-\lambda s)\operatorname d s &:& 0< x\\\int_{-\infty}^x 0 \operatorname d s &:& x\leqslant 0 \end{cases}\\[2ex]&=~\begin{cases} 1-\exp(-\lambda x) &:& 0<x \\ 0 &:& x\leqslant 0 \end{cases}\end{align} $$


Remark:

Many authors lazily assume that the support for well known families of distributions is understood by the reader; or they just mention it much earlier in the text and expect it to be recalled some time later.   This really isn't good practice when addressing students.   (Tip: it is also not a good practice for students in an exam.)

It is very important to include this information somewhere when presenting any expression for a probability density function; either implicitly in the text immediately preceeding the expression, or preferably in the expression itself.

As well as ensuring that the reader knows where the support lies, keeping it in mind when integrating helps avoid common errors.   So many, many, common errors.

0
On

It is understood for some distributions, such as the exponential distribution, that the PDF is zero outside the interval of interest. With this understanding we write

$$ \int_{-\infty}^x \lambda e^{-\lambda t} \ dt = \int_{0}^x \lambda e^{-\lambda t} \ dt $$

for the integral. Also notice that I have used a different variable in the integrand to prevent ambiguity.

We can even use an indicator function: $\chi_S (x) = 1$ if $x \in S$, or $0$ otherwise. The above can then be written

$$ \int_{-\infty}^x \lambda e^{-\lambda t} \ \chi_{R_x}(t) \ dt = \int_{0}^x \lambda e^{-\lambda t} \ dt $$

using your notation $R_X$ from above. The Iverson bracket can also be used in place of the indicator: we write $[P(x)] = 1$ if the statement $P$ is true for the given $x$, zero otherwise. Again, the above can be written as

$$ \int_{-\infty}^x \lambda e^{-\lambda t} \ [x \in R_x] \ dt = \int_{0}^x \lambda e^{-\lambda t} \ dt. $$

TL;DR: We really go out to $-\infty$ to be safe, but it's understood in context that the PDF is zero outside of whatever interval we care about.