Just trying to get my math understanding in order.
So my book has a definition for a conditional probability: P(Y|X) = P(Y, X)/P(X).
I understand that X and Y are sets of vectors [Y, X] and the conditional probability is a probability of intersection of 2 sets, renormalized so that the whole set X has a probability of 1.
My problem is that I don't understand how this definition is consistent with the fact that a probability of any particular point in a continuous space is 0 (unless we defined that point not to be 0, but lest assume we didn't).
If X is a vector values continuous random variable in R^n, and Y is in {0, 1}. Then by definition P(Y|X=x) will be undefined at any particular point x, because P(Y, X=x) is 0 and P(X=x) is 0. I understand that I probably supposed to replace P with f for a continuous case and then this formula works, but I's not what was given to me in a definition.
When I study math behind ML algorithms, books always use this P(Y|X=x) and it never make sense. Is there another, better definition of conditional probability for a continuous case?
Just trying to make sense of P(Y|X) is a bit more rigorous way.
If you will answer with measure theory definition of a conditional probability, I will appreciate it but I will not understand it, unfortunately :)
Thank you!
In general, given a probability space $(\Omega,\mathcal{F},P)$ and two events $A,B\in\mathcal{F}$ such that $P(B)>0$, then: $$ P(A|B)=\frac{P(A\cap B)}{P(B)} $$ If we are talking about random variables, then we have instead the following definition (def. of the conditional density function):
let $X,Y$ be two random variables and let $y\in\mathbb{R}$ such that $f_{Y}(y)>0$ (where $f_Y$ is the marginal density of $Y$) then we define the conditional density of the vector $(X,Y)$ as: $$ f_{X|Y}(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} $$ Note that if $X$ is a discrete random variable, then this function is a density of a discrete random variable; similarly if $X$ is an absolutely continuous random variable, then this function is a density of an absolutely continuous random variable.
So I think your question was: "What if $Y$ is continuous?" Well, if so, the density function $f_Y(y)$ is not equal to $P(Y=y)$ as in the discrete case. A function $f:\mathbb{R}\to\mathbb{R}^+$ is a density function of an absolutely continuous random variable $Y$ if you can integrate it and $P(Y\leq t) = \int_{-\infty}^{t}f(y)dy$ , $\forall t\in\mathbb{R}$.