I'm trying to use the formalism of measure theory to define a learning task. I'm trying to say that a classifier $f: X \rightarrow Y$ tries to approximate the joint probability distribution for $X$ and $Y$.
I don't really know how to define this joint probability distribution, but have the following idea:
Let $(\mathcal{X}, \mathcal{A}, \mu_x)$ and $(\mathcal{Y}, \mathcal{B}, \mu_y)$ be two probability spaces. We define for $\mathcal{X}$ a random variable $X:\mathcal{X} \rightarrow \mathbb{R}$ and for $\mathcal{Y}$ a random variable $Y:\mathcal{Y} \rightarrow \mathbb{R}$. The joint distribution could be defined as a measure on the product space $\Omega = \mathcal{X} \times \mathcal{Y}$: $\mu(\Omega)$. And therefore my classifier would be a function $f: \mathcal{X} \rightarrow \mathcal{Y}$.
Does that sound correct or did I completely misunderstood the whole thing ?
More correct would be to say that $Z=(X,Y)$ is a random variable from the product space $(\Omega,\mu_x\otimes \mu_y)$ to $\mathbb R^2$. And it is not that $f$ is trying to approximate the joint probability distribution, but rather the graph of $f$ $$ \textrm{graph}(f):=\{(x,f(x))\colon x\in \mathbb R\} $$ which tries to approximate the support of $Z$.