Are marginal densities always greater than the corresponding joint density?

3.6k Views Asked by At

I.e. if $\mathbb P\left(x,y\right)$ is a joint density function, and $\mathbb P\left(y\right)$ is a marginal distribution is it always true that:

$\mathbb P\left(x,y\right)\leq \mathbb P\left(y\right)$ ?

Using the rule for probability of intersections:

$\mathbb P\left(\mathbb X = x \cap \mathbb Y= y\right)= \mathbb P\left(\mathbb X=x\right) + \mathbb P\left(\mathbb Y=y\right) - \mathbb P\left(\mathbb X=x \cup \mathbb Y=y\right)$

(for random variables $x,y$ taking values in some sets $\mathbb S\left(\mathbb X\right),\mathbb S\left(\mathbb Y\right)$)

Seems to imply this, but I find it surprising.

Also, it seems to contradict an inequality from information theory:

$\mathbb H\left(\mathbb X,\mathbb Y\right) \geq \mathbb H\left(\mathbb Y\right)$

2

There are 2 best solutions below

1
On BEST ANSWER

Consider first the discrete case. Here the joint "density" $f_{X,Y}$ (which I shall call the probability mass function following the terminology of Wikipedia) is defined by: $$f_{X,Y}(x,y) = P(X=x,\ Y=y)$$ where $P$ is the probability measure. (This is a "density" with respect to summation.) Since we clearly have $\{X=x,\ Y=y\} \subseteq \{Y=y\}$, and since $P$ is increasing (greater events with respect to inclusion have greater probabilities with respect to the order on $[0,1]$), we see that: $$f_{X,Y}(x,y) = P(X=x,\ Y=y) \le P(Y=y) = f_Y(y)$$ so this is the inequality you requested, in the discrete case.

Considering how the marginal probability mass can be found from the joint one by "summing out" one of the variables, this can also be understood as the fact that in a (possibly finite) series of terms from $[0,1]$, one single term is less than or equal to the entire sum of the series.


Now, for the absolutely continuous case, we must consider densities with respect to integration, which are different from the "probability mass functions" of the discrete setup. Their values are not probabilities in themselves.

As an example, let $X,Y$ be random variables such that: $$f_{X,Y}(x,y) = \begin{cases} 3 & \text{if $0\le x\le \frac 16$ and $0\le y\le 2$} \\ 0 & \text{otherwise} \\ \end{cases}$$ This simply means that the vector $(X,Y)$ is uniformly distributed in a specific tall slim rectangle. Note that one side of the rectangle is less than $1$, and the other one greater than $1$. The area of the rectangle is $\frac 13$, of course.

(Also note that this kind of densities take values in $\left[ 0,\infty \right)$, not in $[0,1]$.)

With this example, the marginal densities would be: $$f_X(x) = \begin{cases} 6 & \text{if $0\le x\le \frac 16$} \\ 0 & \text{otherwise} \\ \end{cases}$$ and: $$f_Y(y) = \begin{cases} \frac 12 & \text{if $0\le y\le 2$} \\ 0 & \text{otherwise} \\ \end{cases}$$ so this shows that the answer to your question is generally no in the continuous case.

2
On

Using the notation $f_{XY}$ for the joint density function of $X$ and $Y$, and $f_X$ for the density function of $X$. $$ f_X(x) = \int_{-\infty}^\infty f_{XY}(x,y) \,dy $$ As Jeppe Stig Nielsen's answer shows very clearly, you can't in general conclude from this that $f_X(x) \ge f_{XY}(x,y)$, because for example $f_{XY}(x,y)$ might be very large over a very small range of $y$.

Your rule for probability of intersections shows that for events $\mathbb{X}$ and $\mathbb{Y}$ and random variables $X,Y$ $$\begin{align} P(X=\mathbb{X} \cap Y=\mathbb{Y}) &= P(X=\mathbb{X}) + P(Y=\mathbb{Y}) - P(X=\mathbb{X} \cup Y=\mathbb{Y})\\ P(X=\mathbb{X} \cap Y=\mathbb{Y}) &= P(Y=\mathbb{Y}) + [P(X=\mathbb{X}) - P(X=\mathbb{X} \cup Y=\mathbb{Y})] \\ P(X=\mathbb{X} \cap Y=\mathbb{Y}) &\le P(Y=\mathbb{Y}) \end{align}$$ because $[P(X=\mathbb{X}) - P(X=\mathbb{X} \cup Y=\mathbb{Y})] \le 0$.

There is a fundamental difference between probability density and probability, as illustrated by the fact that the inequality holds for the latter but not the former. If you want to get the inequality using probability densities you need to convert them into probabilities. E.g., letting set $\mathbb{S}(\mathbb{X})$ = $\{ x: x_1\le X\le x_2\}$ and $\mathbb{S}(\mathbb{Y})$ = $\{ y: y_1\le Y\le y_2\}$

$$ P(X=\mathbb{X} \cap Y=\mathbb{Y}) = P(x_1\le X\le x_2 \cap y_1\le X\le y_2) = \int_{x_1}^{x_2}\int_{y_1}^{y_2} f_{XY}(x,y) \,dy dx \\\le\\ P(X=\mathbb{X}) = P(x_1\le X\le x_2) = \int_{x_1}^{x_2}\int_{-\infty}^\infty f_{XY}(x,y) \,dy dx $$ As to the inequality from information theory, this is (I believe, with a quick look at wikipedia) a measure of the uncertainty associated with both random variables. The uncertainty is greater when there are more possible outcomes (and is the same when the outcomes are the same). This is consistent with the above, i.e. the probability of joint outcomes being lower when the number of outcomes is higher (and the same when the outcomes are the same).