Probability distribution vs. probability mass function (PMF): what is the difference between the terms?

11k Views Asked by At

Consider a discrete case. PMF is the probability each value of random variable gets. So, for example, X ~ Poisson(2). I plot these probabilities (below), so I can say that I show the PMF of X. But on the other hand I show the distribution of X. For example, I can say whether the distribution I have is symmetrical or not. So, what is the difference between probability distribution and PMF terms (in discrete case)? Below I also bring the definitions from Wikipedia, but it is not helpful either.

Many thanks!

Enter image description here

A probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.

A probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

4

There are 4 best solutions below

2
On

I'm not aware of an agreed upon definition/meaning for probability distribution.

On the other hand, probability mass functions and probability density functions have agreed upon definitions and are used to describe probability distributions.

A probability density function is the generalization of probability mass functions to random variables which are not strictly discrete. In the case of a discrete random variable, the main difference is that the probability density function should integrate to one, while the probability mass function should add to one.

Suppose $X$ is a discrete random variable taking values $S=\{x_1,x_2,\ldots\} \subset \mathbb{R}$.

The probability mass function is a function $p : S\to [0,1]$ where $$ p(x) = \mathbb{P}(X=x) $$

On the other hand, the density function (of any RV) can be thought of as, $$ f(x)dx = \mathbb{P}(X\in[x+dx]) $$ In integral form you could write this as, $$ \int_{x}^{x+dx} f(z)dz = \mathbb{P}(X\in [x,x+dx]) $$

That is, the density times the width of a small interval gives the probability that $X$ is in that small interval $X\in[x,x+dx]$.

If the random variable is discrete, then the probability that $X$ is in this interval is the same as the probability $X=x$ for small enough $dx$. So you have $f(x)dx = \mathbb{P}(X=x)$ (or in integral form, $\lim_{dx\to 0}\int_{x}^{x+dx} f(z)dz = \mathbb{P}(X=x)$).

In particular, if $p(x)$ is the pmf for a discrete random variable $X$, then we can write the density function as: $$ f(x) = \sum_{i:p(x_i)\neq 0} p(x_i) \delta(x-x_i) $$ where $\delta(x)$ is the delta distribution; i.e. $\int_a^b f(x)\delta(c)d x = f(c)$ whenever $c\in[a,b]$

10
On

The word "distribution" gets thrown around loosely sometimes, which can cause confusion.

The distribution of a random variable $X$ is the function that takes a set $S \subset \mathbb R$ as input and returns the number $P(X \in S)$ as output. (Technically I should assume that $S$ is a "nice" subset of $\mathbb R$ in some sense, but let's not worry about that.) I think the Wikipedia article would be more clear if it just gave us this definition up front.

The probability mass function (PMF) of a random variable $X$ is the function that takes a number $x \in \mathbb R$ as input and returns the number $P(X=x)$ as output. If $X$ is a discrete random variable, then the PMF of $X$ is a convenient way to specify the distribution of $X$.

Here is one way to describe the relationship between the distribution of $X$ and the PMF of $X$, in the case where $X$ is a discrete random variable. Suppose that the possible values of $X$ are $x_1,x_2,\ldots$ If $f$ is the distribution of $X$, then $$ f(S) = \sum_{i : x_i \in S} P(X = x_i) $$ for any set $S \subset \mathbb R$.

0
On

I provide a simple explanation of this here: Difference between "probability density function" and "probability distribution function"?. In short, a probability mass function is a discrete probability distribution function, where discrete is often implied.

0
On

Probability Mass Function.

I would say pmf of a discrete random variable is a graph or a table or a formulae that specifies the proportion or probabilities associated with each possible value the random variable can take.

It is a function that gives the probability that a discrete random variable is exactly equal to some value.