How do I formulate randomized PAC learning of distributions in measure theory terms (i.e. how to set up conditional probabilities)?

78 Views Asked by At

I am struggling with setting up the measure-theoretic formulation of the randomized PAC learning of distributions problem.

The PAC learning of distributions problem:

Consider a "domain of examples" -- the measurable space $(\mathbb R, \mathcal M)$ where $\mathcal M$ are the Borel sets induced by $\mathbb R$'s usual metric.

A (one-sample) learner is a measurable function $\mathcal L: \mathbb R \to \Delta(\mathbb R)$, mapping samples to distributions: $\Delta(\mathbb R)$ denotes the set of all probability distributions over the measurable space $(\mathbb R, \mathcal M)$, and we equip it with the total variation metric. By taking $\mathcal N$ to be the Borel sets induced by this metric, we get a measurable space $(\Delta(\mathbb R), \mathcal N)$.

Given a set of distributions $\mathcal Q \subseteq \Delta(\mathbb R)$, we wish $\mathcal L$ to satisfy the following property: for every $q \in \mathcal Q$, if we draw one sample $x \sim \mathcal q$, with high probability the output of the learner $\mathcal L (x)$ is close to the underlying distribution $q$. That is,

$$ \underset{x \sim q}{\mathbb P}\{ \text{TV}(\mathcal L(x), q) \leq \alpha \} \geq 1-\beta.$$

We can write this in measure theory notation by the following:

$$q \circ \mathcal L^{-1}\left(\overline{B}_\text{TV}(q, \alpha)\right) \geq 1- \beta,$$

where $q \circ \mathcal L^{-1}$ denotes the pushforward measure of $q$ with map $\mathcal L$.

My question: how do I set this up for randomized learners?

A (one sample) randomized learner is a measurable function $\mathcal R : \mathbb R \to \Delta(\Delta(\mathbb R))$ -- it maps the sample to a distribution over distributions. We want $\mathcal R$ to satisfy the following propery: for every $q \in \mathcal Q$, if we draw one sample $x \sim \mathcal q$, with high probability over the sample, and the internal randomness of the learner, the (random) output of the learner $\mathcal R (x)$ is close to the underlying distribution $q$. That is,

$$ \underset{p \sim \mathcal R(x)}{\underset{x \sim q}{\mathbb P}}\{ \text{TV}(p, q) \leq \alpha \} \geq 1-\beta.$$

My question is, in the same way as above, how do I write this condition in the language of measure theory? I believe this boils down to "how do I express marginalizing in measure theory", but I am not able to formulate the question properly.

1

There are 1 best solutions below

0
On

The only obstacle to "marginalizing" by taking the Lebesgue integral $$\int_{\mathbb{R}}\mathcal R(x)\left(\overline{B}_\text{TV}(q,\alpha)\right) dq(x)=:\underset{p \sim \mathcal R(x)}{\underset{x \sim q}{\mathbb P}}\{ \text{TV}(p, q) \leq \alpha \}$$ is checking that $f:\mathbb{R} \to [0,1]$ by $f(x) = \mathcal R(x)(B)$ is a measurable function for any fixed $B\in\mathcal{N}$.

It suffices to check that $f^{-1}(O)\in \mathcal M$ for all open subsets $O$ of $\mathbb{R}$. Let $$A =\{d\in \Delta(\mathbb{R}) : d(B) \in O\}\subseteq\Delta(\Delta(\mathbb{R}))$$ so that $$f^{-1}(O) = \mathcal R^{-1}(A).$$ Suppose $d \in A$. Since $O$ is open, there exists some $\delta>0$ such that $(d(B)-\delta, d(B)+\delta)\in O$. If $d'\in B_\text{TV}(d,\delta)$, by definition of total variation distance $|d'(B)-d(B)| < \delta$ and therefore $d'(B)\in O$, so $B_\text{TV}(d,\delta)\subseteq A$. We conclude $A$ is open in the topology induced by the total variation distance, and therefore by measurability of $\mathcal R$ we have $\mathcal R^{-1}(A) \in \mathcal M$.