General questions on terminology for probability and statistics.

76 Views Asked by At

As someone self-studying college-level mathematics, I am currently reading an introductory book on probability and statistics. However, I am a bit perplexed by the following terms, as they appear to be used interchangeably from my point of view.

Could someone help me clarify my misconceptions, and also inform me if there's anything I have missed? This would be much appreciated.

Terminology alongside my current understanding of them

  1. probability mass function - function that returns the chance of a random variable taking on some value (for discrete only)

  2. probability density function - function that returns the chance of a random variable taking on some value (for continuous only)

  3. probability distribution - a general term that describes both #1 and #2

  4. cumulative distribution function - a general term that describes both #1 and #2 aggregated over some range

1

There are 1 best solutions below

2
On BEST ANSWER

Probability mass function is simply a function which returns a probability between $0$ and $1$ that events in the sample space will occur. For example, consider an unfair coin being flipped.

Say it has a $60\%$ chance of landing on heads, and $40\%$ chance of landing on tails. Lets call our sample space: $\Omega = \{H, T\}$. Where $H$ denotes a head, and $T$ denotes a tail.

Our probability mass function then is a function of the form $f:\Omega \rightarrow \mathbb{R}$, this can be written as:

$ f(x) = \begin{cases} 0.6 & x= H \\ \\ 0.4 & x = T \end{cases}$

Notice that $f(H) + f(T) = 1$

In fact, in general we must have: $$\sum_{x\in \Omega} f(x) = 1$$

This corresponds with the fact that something must happen.

As for probability density function, this is similar concept, except in this case we have a continuous sample space. In this case we have a function of the form $f:\Omega \rightarrow \mathbb{R}$, however in this case $\Omega$ is continuous. For our purposes we could take $\Omega$ to be $\mathbb{R}$.

In this case imagine the function $f$ is $f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ for all $x \in \mathbb{R}$

Here we have an important distinction from the probability mass function. A particular output of the function $f(a)$ does NOT mean that $a$ has $f(a)$ chance of occurring. Rather it is a relative measurement, and gives its likelihood of occurring as compared to other values of $f(x)$. The probability of any particular value $a$ of occurring is in fact $0$. This may seem a bit strange but it is a result of measure theory and has to do with the size of infinity.

A probability distribution is just a blanket term for either of the first two mentioned.

The cumulative distribution function is essentially a function of the form $F: \Omega \rightarrow \mathbb{R}$ where for any $x \in \Omega$, $F(x)$ is the probability that an event less than or equal to $x$ will occur.

For example imagine we are rolling a fair six sided die. If $F$ is the cumulative distribution function, then $F(4)$ is the probability that anything less than or equal to a $4$ will be rolled. So if $f$ denotes the probability that a certain number will be rolled, then $F(4) = f(1) + f(2) + f(3) + f(4) = \frac{2}{3}$.

This is particularly useful when talking about probability density functions, the reason for which was alluded to earlier. In a probability density function the probability that any particular outcome of an experiment will occur is $0$. Imagine that the function we mentioned before $f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ represents the probability that a random number generator will choose the number $x$.

In this case if we have a cumulative distribution function called $F$, then $F(x)$ denotes the probability that the number chosen is less than or equal to $x$. This is something useful that can give us a non-zero probability.

The function $F(x)$ would be written as: $$ F(x) = \int_{-\infty}^{x} f(t) dt$$

where $t$ is just a dummy variable of integration.