Marginalisation when one variable is discrete and the other is continuous

163 Views Asked by At

If we have two discrete random variables $X$ and $Y$, $$p_X(x) = \sum_y p_{XY}(x,y) = \sum_y p_{X|Y}(x|y)p_Y(y) = \sum_y \mathbb{P}(X = x|Y=y) \mathbb P(Y=y)$$

Similarly, if we have two continuous random variables $X$ and $Y$, $$p_X(x) = \int_y p_{XY}(x,y) dy = \int_y p_{X|Y}(x|y)p_Y(y) dy $$

However, I am slightly confused about what happens when one is continuous and one is discrete. For example, if $X$ is continuous and $Y$ is discrete, I want to say: $$p_X(x) = \sum_yp_X(x,Y=y) = \sum_y p_X(x|Y=y)p_Y(y) = \sum_y p_X(x|Y=y)\mathbb P(Y=y) $$ This looks intuitively right, but I am also a bit stuck on the notation.

I'm not sure if a joint distribution exists for $X$ and $Y$ in the case that one is continuous and one is discrete, so I'm not sure it would be correct in the above to say: $$p_X(x) = \sum_yp_{XY}(x,Y=y)$$

However, $p_X(x) = \sum_yp_X(x,Y=y)$ doesn't look quite right to me either.

I am similarly unsure about what happens if we want to obtain $p_Y(y)$ (if $X$ is continuous and $Y$ is discrete as above). I know we would have to integrate over $x$, but I am struggling to come up with an expression for the marginalisation with notation that makes sense.

I would appreciate some clarity on this.

2

There are 2 best solutions below

0
On BEST ANSWER

If $X$ is a real-valued continuous random variable and $Y$ is discrete with values in some countable set $E$, then there exists a map $p_{XY}:\mathbb R\times E\to\mathbb R$ such that $$ \forall A\textrm{ measurable }\subset\mathbb R,\quad\forall y\in E,\quad\mathbb P(X\in A,Y=y)=\int_Ap_{XY}(x,y)\,dx. $$

In that case, the marginal distribution of $X$ has density $p_X$ defined by $$ \forall x\in\mathbb R,\quad p_X(x)=\sum_{y\in E}p_{XY}(x,y), $$ and the marginal distribution of $Y$ is given by $$ \forall y\in E,\quad\mathbb P(Y=y)=\int_{\mathbb R}p_{XY}(x,y)\,dx. $$

0
On

The joint distribution of two random variables $X$ and $Y$ always exists. Given the question, however, I guess OP is asking whether the joint distribution of $(X, Y)$ admits a certain type of "density". To this end, I will give a quick review on density.


1. Let me begin by clarifying the notion of distribution, PMF, and PDF.

A distribution of a random variable $X$ refers to a complete set of information needed to answer any probability-related questions on $X$. This information can be encoded in many different ways. Below is a not-quite-exhaustive list of such ones:

  • Cumulative Distribution Function. The CDF, or simply the distribution function, of $X$ is the function $F_X$ defined by $ F_X(x) = \mathbf{P}(X \leq x) $. It can be prove that $F_X$ alone is enough (albeit not necessarily straightforward) to compute any probability of the form $\mathbf{P}(X \in A)$.
  • Moment Generating Function. The MGF of $X$ is the function $M_X$ defined by $M_X(s) = \mathbf{E}[e^{sX}]$. If $M_X(s) < \infty$ for any sufficiently small $s$, then $M_X$ uniquely determines the distribution of $X$.

  • Probability Measure. The pushforward measure $\mu_X$ on $\mathbb{R}$, defined by $\mu_X(A) = \mathbf{P}(X \in A)$ for all Borel subset $A$ of $\mathbb{R}$, is a probability measure. By definition, $\mu_X$ can compute any probability related to $X$. For this reason, $\mu_X$ is simply called the distribution of $X$ in mathematics.

  • Probability Mass Function. Let $c$ denote the counting measure on $\mathbb{R}$, i.e., $c(A) = \sum_{x \in A} 1 = |A|$. If the distribution $\mu_X$ of $X$ admits a density $p_X : \mathbb{R} \to [0, \infty)$ with respect to $c$ in the sense that $$ \mathbf{P}(X \in A) = \int_A p_X(x) \, \mathrm{d}c(x) = \sum_{x \in A} p_X(x) $$ for any Borel subset $A$ of $\mathbb{R}$, then $p_X$ is called the PMF of $X$. Surely, if $X$ admits a PMF (in which case we say $X$ has a discrete distribution), it completely characterizes the distribution of $X$. To make the notation more transparent, I will borrow the derivative notation to write the PMF $p_X$ as: $$ p_X(x) = \frac{\mathrm{d}\mu_X(x)}{\mathrm{d}c(x)} $$ (Disclaimer. This notation is not standard.)

  • Probability Density Function. If the distribution $\mu_X$ of $X$ admits a density $p_X : \mathbb{R} \to [0, \infty)$ with respect to the Lebesgue measure on $\mathbb{R}$, in the sense that $$ \mathbf{P}(X \in A) = \int_A p_X(x) \, \mathrm{d}x $$ for all Borel subsets $A$ of $\mathbb{R}$, then $p_X$ is called a PDF of $X$. Like before, if $X$ admits a PDF (in which case we say $X$ has a continuous distribution), then the PDF characterizes the distribution of $X$. Again, I will write the PDF $p_X$ as: $$ p_X(x) = \frac{\mathrm{d}\mu_X(x)}{\mathrm{d}x} $$

2. The last two examples reveal that both PMF and PDF can be described in a unified framework. Once we specify the measure with respect to which a density is computed, then both PMF and PDF can be treated using the same language. The same is true for multivariate distributions, allowing us to systematically define, manipulate, and in particular, marginalize joint densities.

To demonstrate this, we turn to the bivariate case. Let $X$ and $Y$ be random variables. Then the joint distribution of $(X, Y)$ refers to the probability measure $\mu_{X,Y}(A) = \mathbf{P}((X, Y) \in A)$. As in the univariate case, it may happen that $\mu_{X,Y}$ is characterized by a density with respect to a measure on $\mathbb{R}^2$:

  • If both $X$ and $Y$ have discrete distributions, then it turns out that $(X, Y)$ also have a discrete distribution. That is, $\mu_{X,Y}$ admits the density $$ p_{X,Y}(x,y) = \frac{\mathrm{d}^2\mu_{X,Y}(x,y)}{\mathrm{d}c(x)\mathrm{d}c(y)} $$ with respect to the counting measure on $\mathbb{R}^2$ so that $$ \mathbf{P}((X, Y) \in A) = \iint_{A} p_{X,Y}(x,y) \, \mathrm{d}c(x)\mathrm{d}c(y) = \sum_{(x,y) \in A} p_{X,Y}(x,y) $$ for any Borel subset $A$ of $\mathbb{R}^2$. Marginalizing $p_{X,Y}$ is straightforward: $$ p_X(x) = \frac{\mathrm{d}\mu_X(x)}{\mathrm{d}c(x)} = \int_{\mathbb{R}} \frac{\mathrm{d}^2\mu_{X,Y}(x,y)}{\mathrm{d}c(x)\mathrm{d}c(y)} \, \mathrm{d}c(y) = \sum_{y} p_{X,Y}(x, y) $$ and likewise for $p_Y(y)$.

  • Now suppose $X$ and $Y$ have continuous distribution. Contrary to the discrete case, it is not always the case that $(X, Y)$ have a (jointly) continuous distribution. In other words, $\mu_{X,Y}$ may fail to have the density. For example, if $X$ and $Y$ are related by $Y = aX + b$, then $\mu_{X,Y}$ is supported on the line $y = ax + b$ and hence it does not admit a 2-dimensional density.

    Now suppose $(X, Y)$ has a jointly continuous distribution, so that it admits a density $$p_{X,Y}(x,y) = \frac{\mathrm{d}^2\mu_{X,Y}(x,y)}{\mathrm{d}x\mathrm{d}y} $$ with respect to the Lebesgue measure on $\mathbb{R}^2$. Then marginalization can be done in a straightforward way: $$ p_{X}(x) = \frac{\mathrm{d}\mu_X(x)}{\mathrm{d}x} = \int_{\mathbb{R}} \frac{\mathrm{d}^2\mu_{X,Y}(x,y)}{\mathrm{d}x\mathrm{d}y} \, \mathrm{d}y = \int_{\mathbb{R}} p_{X,Y}(x, y) \, \mathrm{d}y. $$

  • Finally we turn to OP's question. Suppose $X$ has a continuous distribution and $Y$ has a marginal distribution. Then we have the following theorem:

    Theorem. Let $X$ and $Y$ be above. Then $(X, Y)$ admits a density $$ p_{X,Y}(x, y) = \frac{\mathrm{d}^2\mu_{X,Y}(x, y)}{\mathrm{d}x\mathrm{d}c(y)} $$ with respect to the (maximal) product measure of $\mathrm{d}x$ and $\mathrm{d}c(y)$, such that $$ \mathbf{P}(X \in A, Y \in B) = \int_{A\times B} \frac{\mathrm{d}^2\mu_{X,Y}(x, y)}{\mathrm{d}x\mathrm{d}c(y)} \, \mathrm{d}x\mathrm{d}c(y) = \sum_{y\in B} \int_A p_{X,Y} (x, y) \, \mathrm{d}x $$ for all Borel subsets $A$ and $B$ of $\mathbb{R}$.1)

    As a consequence, marginalizing on $X$ and $Y$ can be done as $$ p_{X}(x) = \frac{\mathrm{d}\mu_X(x)}{\mathrm{d}x} = \int_{\mathbb{R}} \frac{\mathrm{d}^2\mu_{X,Y}(x, y)}{\mathrm{d}x\mathrm{d}c(y)} \, \mathrm{d}c(y) = \sum_{y} p_{X,Y} (x, y) $$ and $$ p_{Y}(y) = \frac{\mathrm{d}\mu_Y(y)}{\mathrm{d}c(y)} = \int_{\mathbb{R}} \frac{\mathrm{d}^2\mu_{X,Y}(x, y)}{\mathrm{d}x\mathrm{d}c(y)} \, \mathrm{d}x = \int_{\mathbb{R}} p_{X,Y} (x, y) \, \mathrm{d}x, $$ respectively.


1) A technical remark: We can choose $p_{X,Y}(x, y)$ so that it is supported on a set of the form $\mathbb{R}\times E$ for some countable subset $E$ of $\mathbb{R}$. Consequently, all the complication from non-$\sigma$-finiteness of the counting measure $\mathrm{d}c(y)$ can be safely ignored.