Conditional probability definition doesn't make sense (to me) when dealing with uncountable (real) RV's. Please help a confused ML developer.

81 Views Asked by At

Just trying to get my math understanding in order.

So my book has a definition for a conditional probability: P(Y|X) = P(Y, X)/P(X).

I understand that X and Y are sets of vectors [Y, X] and the conditional probability is a probability of intersection of 2 sets, renormalized so that the whole set X has a probability of 1.

My problem is that I don't understand how this definition is consistent with the fact that a probability of any particular point in a continuous space is 0 (unless we defined that point not to be 0, but lest assume we didn't).

If X is a vector values continuous random variable in R^n, and Y is in {0, 1}. Then by definition P(Y|X=x) will be undefined at any particular point x, because P(Y, X=x) is 0 and P(X=x) is 0. I understand that I probably supposed to replace P with f for a continuous case and then this formula works, but I's not what was given to me in a definition.

When I study math behind ML algorithms, books always use this P(Y|X=x) and it never make sense. Is there another, better definition of conditional probability for a continuous case?

Just trying to make sense of P(Y|X) is a bit more rigorous way.

If you will answer with measure theory definition of a conditional probability, I will appreciate it but I will not understand it, unfortunately :)

Thank you!

1

There are 1 best solutions below

6
On BEST ANSWER

In general, given a probability space $(\Omega,\mathcal{F},P)$ and two events $A,B\in\mathcal{F}$ such that $P(B)>0$, then: $$ P(A|B)=\frac{P(A\cap B)}{P(B)} $$ If we are talking about random variables, then we have instead the following definition (def. of the conditional density function):

let $X,Y$ be two random variables and let $y\in\mathbb{R}$ such that $f_{Y}(y)>0$ (where $f_Y$ is the marginal density of $Y$) then we define the conditional density of the vector $(X,Y)$ as: $$ f_{X|Y}(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} $$ Note that if $X$ is a discrete random variable, then this function is a density of a discrete random variable; similarly if $X$ is an absolutely continuous random variable, then this function is a density of an absolutely continuous random variable.

So I think your question was: "What if $Y$ is continuous?" Well, if so, the density function $f_Y(y)$ is not equal to $P(Y=y)$ as in the discrete case. A function $f:\mathbb{R}\to\mathbb{R}^+$ is a density function of an absolutely continuous random variable $Y$ if you can integrate it and $P(Y\leq t) = \int_{-\infty}^{t}f(y)dy$ , $\forall t\in\mathbb{R}$.