Joint density of a random variable and its function

346 Views Asked by At

In reading Section $4.2$ of A.C. Davison's text Statistical Models, when the author discussed the joint distribution of the data $Y$ and the sufficient statistic $S = s(Y)$, he used the notation $f_{Y, S}(y, s; \theta)$ to denote the joint density of $(Y, S)$ and wrote

But as $S$ is a function $s(Y)$ of $Y$, the joint density of $S$ and $Y$ is zero except where $S = s(Y)$, and $f_{Y, S}(y, s; \theta)$ is just $f_Y(y; \theta)$.

In other words, he claimed the existence of the joint density of $(Y, S)$ and it equals to (I will drop the parameter $\theta$ from now on) $$f_{Y, S}(y, s) = \begin{cases} f_Y(y) & \text{ if } s = s(y), \\ 0 & \text{otherwise}. \tag{1} \end{cases}$$

Intuitively, this makes sense. However, it's hard for me to verify it in a mathematically rigorous way (for the ease of discussion, let's assume both $Y$ and $s(Y)$ are univariate). In particular, putting the existence of "density" aside and accepting $(1)$ is correct temporarily, I am unable to verify that the $f_{Y, S}$ given by $(1)$ satisfies $$\int_{\mathbb{R^2}}f_{Y, S}(y, s) dyds = 1, $$ as it should be if $f_{Y, S}$ were a density. Let $C$ denote the set $\{(y, s): s = s(y)\}$ in $\mathbb{R}^2$, my calculation goes as below: $$\int_{\mathbb{R^2}}f_{Y, S}(y, s) dyds = \int_C f_Y(y) dyds. \tag{2}$$ But how to justify the Lebesgue integral appeared in the right-hand side of $(2)$ equals to unity, given the condition $\int_{\mathbb{R}^1}f_Y(y)dy = 1$?

To make my question more general, when we consider the joint distribution of a random variable $Y$ and its function $s(Y)$, suppose $Y$ has density $f$, is it always possible to declare the existence of the joint density $g$ of $(Y, s(Y))$ such that for any Borel set $A \subset \mathbb{R}^2$, $$P[(Y, s(Y)) \in A] = \int_A g(y, s) dyds \text{ ?}$$ If so, does it take the form as $(1)$? You may impose any regularity conditions on the function $s(\cdot)$ if necessary.

2

There are 2 best solutions below

0
On BEST ANSWER

I want to thank jesterII's answer for taking me back to seek for a math-rigor proof for the factorization theorem that rests on measure-theoretic probability. The reference I referred to is the Sufficient Subfields of Section $34$ in Probability and Measure by Patrick Billingsley.

The suspicious notion "joint density" in the question was circumvented under the advanced probability framework by attacking the conditional probability (in its rigorous sense) directly. During this process, there is a technical step of constructing an equivalent probability measure with density $f(y) = \sum_n c_n f_Y(y; \theta_n)$ that is equivalent to the original density family $\{f_Y(y; \theta): \theta \in \Theta\}$. This is the most delicate step in the proof of the factorization theorem, which, however, is typically overlooked by most more statistical-flavored texts.

In summary, the "joint density" mentioned in my question is probably a premature notion and is actually not needed in the proof of the main result. It is understandable that the term "(joint) density" tends to be used in a rather loose manner and overly used than it should have been in the statistical literature. I would also consider this as an example of a successful application of the advanced probability theory to statistics --- without the help of $\sigma$-field version of conditional probability and other measure-theoretic techniques, it seems hopeless to get around the plight raised in the question.

0
On

I agree there is something strange with Davison's definition of the joint density $f_{Y,S}$, and I doubt such a joint density formally exists as a Radon-Nikodym derivative (with respect to the Lebesgue measure). Below is a somewhat informal argument.

Let $\mu$ denote the joint distribution of $(Y,s(Y))$. Observe that $\mu(\left\{(y, s(y)) \mid y \in \mathbb{R} \right\}) = 1$. However, the Lebesgue measure of this set in $\mathbb{R}^2$ is 0, so $\mu$ is not absolutely continuous with respect to the Lebesgue measure. It follows there is no joint density $f_{Y,S}$ of $\mu$ with respect to the Lebesgue measure on $\mathbb{R}^2$.

Below is a nice academic paper that looks into exactly this type of question and discusses paradoxes (The Substitution Fallacy) that arise in elementary probability and statistics textbooks.

Proschan, Michael A., and Brett Presnell. "Expect the unexpected from conditional expectation." The American Statistician 52.3 (1998): 248-252. https://www.tandfonline.com/doi/abs/10.1080/00031305.1998.10480576