If the only problem with the traditional definition is that of conditioning on events of probability zero (like those in continuous r.v when $\{X=x\}$) then this problem should be already solved by taking the limit of arbitrary small set around $x$, so why do we still need another solution (conditioning on the $\sigma$-algebras instead) ?
Is there other cases that we can have $P(X=x)=0$ other than continuous random variables ? because in continuous r.v this problem is solved as explained and at the end the density function is used instead of $P(X=x)$.
Is there any other problems that could happen if the conditional expectation is not defined at some points or sets ?
EDIT
More clarification:
- By the general definition I mean conditioning on sigma algebras instead of a certain event and defining $E(X|\mathscr{G})$ as a random variable satisfying (i) $E(X|\mathscr{G})$ is measurable $\mathscr{G}$ and integrable, (ii) $\int_{G}E(X|\mathscr{G})dP=\int_{G}XdP$ for all $G\in \mathscr{G}$.
- The traditional way of calculating C.E in case of conditioning on continuous random variable $E[Y|X=x]= \frac{\int_{R}yf_{X,Y}(x,y)dy}{\int_{R}f_{X,Y}(x,y)dy}$ is computed by taking the limit of arbitrary small area around $x$, and this formula is proven to agree with the general definition of C.E and satisfies the two properties. So if the approach of taking limits is problematic then how it agrees with the more rigorous definition ?
Your solution "taking the limit of arbitrarily small set around $x$" is not a solution. The problem is that the expectations there are "small sets" that still are skewed enough to prevent a limit from existing.
For example, suppose $X$ and $Y$ are independent random variables, uniformly distributed in $[0,1]$. What should $E(Y\mid X=\frac12)$ be?
We can certainly find arbitrarily small sets around the line $X=\frac12$ on which the expected value of $Y$ is $\frac12$ -- but there are also sets such as $$ Q_\varepsilon = \{(x,y) : |x-\tfrac12| < (1+y)\varepsilon \} $$ and $$ R_\varepsilon = \{(x,y) : |x-\tfrac12| < (2-y)\varepsilon \} $$ where the expected value of $Y$ is something different from that.
Note that $Q_\epsilon$ and $R_\epsilon$ are "arbitrarily small" in the sense that every open subset of $[0,1]^2$ that contains the line $X=\frac12$ will contain $Q_\varepsilon$ and $R_\varepsilon$ for sufficiently small $\varepsilon$.
In this particular case we can try to sneak around the problem by noting that the sample space is a metric space and then attempt to use the metric to define a "canonical" surrounding of size $\varepsilon$ of the conditioning event. But this does not generalize to more abstract situations, and would also mean that the conditional expectation would depend on which metric we use, even among ones that are topologically equivalent. Moving the whole thing to a homeomorphic (or even diffeomorphic) sample space, transering the probability measure, might well change the conditional probabilities.,