Meaning of $P(Y|X=x)$

188 Views Asked by At

Suppose that $X$ and $Y$ are two random variables on $(\Omega, \mathcal H, P)$ with values in $(\mathbb R,\mathcal B_{\mathbb R})$. I want to understand what is "formally" the expression $P(Y|X=x)$ where $x\in\mathbb R$.


If $\mathcal F$ is a $\sigma$-subalgebra of $\mathcal H$ and $k_{Y,\mathcal F}$ is a stochastic kernel from $(\Omega, \mathcal F)$ to $(\mathbb R,\mathcal B_{\mathbb R})$, namely

$$k_{Y,\mathcal F}:\Omega\times\mathcal B_{\mathbb R}\longrightarrow [0,1]$$

then we say that $k_{Y,\mathcal F}$ is a regular conditional distribution of $Y$, given $\mathcal F$ if:

$$k_{Y,\mathcal F}(\omega,B)=P({Y\in B}\,|\,\mathcal F)(\omega)$$

for every $B\in\mathcal B_{\mathbb R}$, and for $P$-almost all $\omega\in\Omega$. Remember that by definition $P({Y\in B}\,|\,\mathcal F)$ is the random variable $E(\chi_{\{Y\in B\}}\,|\, \mathcal F)$ (we define the probability of an event given $\mathcal F$ thanks to the expected conditional value).


Question: Now let's return to the notation $P(Y\,|\, X=x)$, I dont understand how this "object" is related to the stochastic kernel $k_{Y,\,\sigma(X)}$ ($\sigma(X)$ is the $\sigma$-algebra generated by $X$).

Once that we understand $P(Y\,|\, X=x)$, then we can finally define the number $P(Y=y\,|\, X=x)$ also when $P(X=x)=0$; is this right?

The above question comes from the theory of Markov chains, infact the transition probabilities are numbers as $P(X_{n+1}=j\,|\, X_n=i)$ and the first observation is: "Who ensures that $P(X_n=i)\neq 0$? I need a more general notion of conditional probability".

Reference: My personal main reference is Klenke's book, but I don't understand his explaination (page 181).

Many thanks in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

The notation $P(Y\mid X=x)$ probably means the distribution $\mu_x$ where the family $(\mu_x)_x$ is a conditioning of the distribution of $Y$ conditionally on $X$. Informally, $\mu_x(B)=P(Y\in B\mid X=x)$ for every Borel subset $B$ and every $x$. Formally, each function $G_B:x\mapsto\mu_x(B)$ is such that $P(Y\in B\mid X)=G_B(X)$ almost surely.