In introductory courses on probability one is often introduced to problems of the form:
Suppose $X|Y = y \sim \text{Geom}(y)$, where $Z \sim \text{Geom}(p)$ is the Geometric distribution $P(Z = k) = (1-p)^{k-1 }p$. Calculate the distribution of $X$.
If also $Y $ is discrete valued one could take $X|Y = y \sim \text{Geom}(y)$ to mean that
$$P[X = k | Y= y] = \frac {P [X= k, Y = y ]} {P [Y= y ]} = (1-y)^{k-1 } y $$
and to get $P [X= k ] $ one would sum this expression over all $y $ and multiply each term with $P [Y=y ]$. This in effect is an application of the fact that a probability measure is "$\sigma $-additive".
One is told that in a similar vein, when $Y $ has a continuous distribution, one should integrate over $y$ with respect to the density of $Y$.
I would like to motivate why this is so.
In the case when $Y $ is continuous, I suppose the framework is that we assume that there exists a probability kernel $\kappa(y, \{k \} )= (1 -y)^{k-1 }y$
And the claim would then be that
$$P [X = k ] = \int \kappa(y, \{k \} ) f_Y(y) \, dy $$
if a density exists, and otherwise
$$P [X = k ] = \int \kappa(y, \{k \} ) \, P_Y(dy)$$
what motivates this?
Thanks in advance!
I'm not sure what you're looking for, but the main statement you're asking about is essentially the tower property of conditional expectation, i.e. that for any random variables $Z$ and $W$: $$ E[Z] = E[E[Z|W]]. $$ This is a more general version of what you describe above: "to get $P[X=k]$ [from $P(X=k|Y=y)$] one would sum this expression over all y and multiply each term with $P[Y=y]$". This is the same as saying that to get $P(X=k)$, you need to average $P(X=k|Y=y)$ over all possible values of $y$ (where the average is weighted according to the distribution of $Y$). In a similar way, to get $E[Z]$ from $E[Z|W]$, you need to average over all values of $W$. You can also think of $E[Z|W]$ as conditioning on $W$, and $E[E[Z|W]]$ as unconditioning, leaving $E[Z]$.
In the setting you're asking about, we can take $Z = 1_{\{X=k\}}$ (since then $E[Z] = P(X=k))$, and $W = Y$. The above property is then $$ P(X=k) = E[1_{\{X=k\}}] = E[E[1_{\{X=k\}}|Y]] = E[P(X=k|Y)]. $$ Note, $P(X=k|Y)$ is a function of both $k$ and $Y$; you can write it as $\kappa(Y,k) = P(X=k|Y)$ similarly to what you did above. And then $$ P(X=k) = E[\kappa(Y,k)] = \int \kappa(y,k) P_Y(y) $$ by the usual way of computing the expectation of a function of $Y$ (this is called the law of the unconscious statistician in undergrad probability, or the change-of-variable formula in graduate probability).
Does this help?