Intuitive idea behind the probability density function

3.8k Views Asked by At

As an application of Calculus, I am currently teaching some material about continuous random variables. My main example is the height $X$ of a French male chosen randomly in the French population.

To explain the probability density function (pdf), I explain that, contrary to discrete variables, knowing $p(X=x)$ is not really interesting (who cares about the probability of being exactly 1.783424567 meter?), but what is of interest is $p(a\leqslant X\leqslant b)$, the probability of being in some interval. So, $p(X=x)$ isn't an interesting one variable function whereas $P(a\leqslant X\leqslant b)=g(a,b)$ is a meaningful function of two variables for probabilities and statistics.

But, instead of studying $g$, we prefer to associate to $X$ a pdf, i.e. a function $f$ such that $p(a\leqslant X\leqslant b)=\int_a^b f(x)\ dx$.

But what is the reason for the introduction of the pdf? Except saying something like "this is more convenient and this is the genius idea of modern probabilities", I had no argument. Could we do probabilities without pdf? What should be a better way to introduce pdf?

I have to precise this is aimed to one variable calculus student, with few knowledge of probabilities (finite variables only) and of course, no idea of what is Lebesgue integral (and how it generalizes both sums $\Sigma$ and Riemann integrals).

2

There are 2 best solutions below

1
On BEST ANSWER

As you have pointed out, in continuous probability situations one obtains interesting probability values only for "reasonable" subsets of the event space $\Omega$, say for intervals, circles, rectangles, as the case may be. Now the number of such subsets is huge, so that it is impossible to create a list of all probability values as in the discrete case. And at the same time such a list would be highly redundant, as it would have to satisfy, e.g., $P(A\cup B)+P(A\cap B)=P(A)+P(B)$ for all $A$ and $B$. In your example we automatically have $g(a,c)=g(a,b)+g(b,c)$ when $a<b<c$.

Introducing PDFs is a means of eliminating this redundancy without losing any information. It so happens that in geometrical situations the probability for sets $A\subset\Omega$ of small diameter is roughly proportional to the length (area, volume, spacial angle, etc.) of $A$: $$P(A)\doteq f\cdot{\rm length}(A)\qquad({\rm length}(A)\ll1)\ ,\tag{1}$$ but the proportionality factor $f$ depends on the exact spot $x$ where $A$ is located. The function $x\mapsto f(x)$ created in this way is called the PDF of the random point in question. This means that we should replace the first "Ansatz" $(1)$ by the more refined $$P(A)\doteq f(x)\>{\rm length}(A)\qquad(A\subset B_\epsilon(x),\ \epsilon\ll1)\ .\tag{2}$$ When $A\subset \Omega$ is "large" then $(2)$ immediately leads to $$P(A)\doteq \sum_{k=1}^N f(x_k)\ {\rm length}(A_k)\doteq\int\nolimits_A f(x)\ {\rm d}x\ .$$ Here $A=\bigcup_{k=1}^N A_k$ is a partition of $A$ into tiny subsets (intervals) $A_k$ to which $(2)$ can be applied.

0
On

Suggestion: Apart from height (that you used as example), one of the most standard continuous random variable $X$ is "the time until something occurs". For example the time until a newly installed lamp or laptop fail for the first time, time until the next heavy rainfall etc. Now, although the question say $P(X=4)$ does not seem so silly (where 4 denotes for example months), the question $P(X=4.1354...)$ certainly does. Then (as they start to get confused) you can draw on the board the pdf and ask them what they understand when they are looking at it (ok, probably not much), but you can explain that the higher the curve the most probable that something in that region occurs. For example you can draw the following curve enter image description here (is a gamma function, suitable to describe time, if you choose "height" draw a more symmetrical curve that will look like the normal distribution) and you can say the following

  1. The higher the curve $f(x)$ the most probable that the laptop fails in that period.
  2. The curve (pdf) can only be positive or zero $f(x)\ge0$. Here, the first failure cannot occur in negative time, so although $f(x)$ is defined every $x \in \mathbb R$, it is equal to zero for negative values of $x$.
  3. The whole area under the curve has a measure of $1$ which corresponds to 100% (everything that can happen is that under the curve).
  4. Since everything under the curve corresponds to $100%$ then it logical to say that the probability that something occurs between two points $a$ and $b$ is the area between these two points (see image).
  5. Areas under curves are measured as the integral of the curve (no need to explain more about Lebesgue or Riemann integral, that is simply the way to do it).
  6. Summing up, probability equals to area, area equals to integral under the curve $f(x)$ (it is that way), so $$P(a\le X\le b)=\int_{a}^{b}f(x)\,dx$$ Of course $P(a<X<b)$ would yield the same result, so it is not important whether the limits are included or not, since for the integration it is not a significant change.
  7. As a special case of the above, if you choose $a=b$ then $$P(a\le X\le b)=P(a\le X \le a )=P(X=a)=\int_{a}^{a}f(x)\,dx=0$$ which agrees with the intuition that there is no point to ask about the probability that something occurs in a certain moment (sec, milisec etc) in the future, but only in a time interval of considerable length.