In reading Section $4.2$ of A.C. Davison's text Statistical Models, when the author discussed the joint distribution of the data $Y$ and the sufficient statistic $S = s(Y)$, he used the notation $f_{Y, S}(y, s; \theta)$ to denote the joint density of $(Y, S)$ and wrote
But as $S$ is a function $s(Y)$ of $Y$, the joint density of $S$ and $Y$ is zero except where $S = s(Y)$, and $f_{Y, S}(y, s; \theta)$ is just $f_Y(y; \theta)$.
In other words, he claimed the existence of the joint density of $(Y, S)$ and it equals to (I will drop the parameter $\theta$ from now on) $$f_{Y, S}(y, s) = \begin{cases} f_Y(y) & \text{ if } s = s(y), \\ 0 & \text{otherwise}. \tag{1} \end{cases}$$
Intuitively, this makes sense. However, it's hard for me to verify it in a mathematically rigorous way (for the ease of discussion, let's assume both $Y$ and $s(Y)$ are univariate). In particular, putting the existence of "density" aside and accepting $(1)$ is correct temporarily, I am unable to verify that the $f_{Y, S}$ given by $(1)$ satisfies $$\int_{\mathbb{R^2}}f_{Y, S}(y, s) dyds = 1, $$ as it should be if $f_{Y, S}$ were a density. Let $C$ denote the set $\{(y, s): s = s(y)\}$ in $\mathbb{R}^2$, my calculation goes as below: $$\int_{\mathbb{R^2}}f_{Y, S}(y, s) dyds = \int_C f_Y(y) dyds. \tag{2}$$ But how to justify the Lebesgue integral appeared in the right-hand side of $(2)$ equals to unity, given the condition $\int_{\mathbb{R}^1}f_Y(y)dy = 1$?
To make my question more general, when we consider the joint distribution of a random variable $Y$ and its function $s(Y)$, suppose $Y$ has density $f$, is it always possible to declare the existence of the joint density $g$ of $(Y, s(Y))$ such that for any Borel set $A \subset \mathbb{R}^2$, $$P[(Y, s(Y)) \in A] = \int_A g(y, s) dyds \text{ ?}$$ If so, does it take the form as $(1)$? You may impose any regularity conditions on the function $s(\cdot)$ if necessary.
I want to thank jesterII's answer for taking me back to seek for a math-rigor proof for the factorization theorem that rests on measure-theoretic probability. The reference I referred to is the Sufficient Subfields of Section $34$ in Probability and Measure by Patrick Billingsley.
The suspicious notion "joint density" in the question was circumvented under the advanced probability framework by attacking the conditional probability (in its rigorous sense) directly. During this process, there is a technical step of constructing an equivalent probability measure with density $f(y) = \sum_n c_n f_Y(y; \theta_n)$ that is equivalent to the original density family $\{f_Y(y; \theta): \theta \in \Theta\}$. This is the most delicate step in the proof of the factorization theorem, which, however, is typically overlooked by most more statistical-flavored texts.
In summary, the "joint density" mentioned in my question is probably a premature notion and is actually not needed in the proof of the main result. It is understandable that the term "(joint) density" tends to be used in a rather loose manner and overly used than it should have been in the statistical literature. I would also consider this as an example of a successful application of the advanced probability theory to statistics --- without the help of $\sigma$-field version of conditional probability and other measure-theoretic techniques, it seems hopeless to get around the plight raised in the question.