Law of large numbers holding uniformly with respect to a distribution

222 Views Asked by At

Let $X$ and $\varepsilon$ be independent random vectors, $\mathcal{X} = \text{supp}(X)$, and $Y = f(X) + \varepsilon$ for some function $f$. For any $x \in \mathcal{X}$, let $y^i = y^i(\omega)$, $i \in \{1,\cdots,n\}$, be independent samples of $Y \mid X = x$ defined on a measurable space $(\Omega,\mathcal{F})$ and equipped with a probability measure $\mathbb{P}_x$. Suppose for each $x \in \mathcal{X}$, there exists a $\mathcal{F}$-measurable set $S(x)$ such that $\mathbb{P}_x\{S(x)\} = 0$ and for any $\omega\in \Omega \backslash S(x)$, $$\lim_{n \to \infty} \frac{1}{n}\sum_{i=1}^{n} g(y^i(\omega)) = \mathbb{E}[g(Y) \mid X = x]$$ for some function $g$.

Question: Can we find a measurable set $T$ such that $\mathbb{P}_x\{T\} = 0$, $\forall x \in \mathcal{X}$, and for any $\omega\in \Omega \backslash T$, $$\lim_{n \to \infty} \frac{1}{n}\sum_{i=1}^{n} g(y^i(\omega)) = \mathbb{E}[g(Y) \mid X = x]? \tag{1}$$ If this is not true in general, are there mild assumptions on the functions $f$ and $g$ and/or the distribution of $\varepsilon$ under which it holds?

Note: $\mathcal{X} \subset \mathbb{R}^m$ is the support of $X$, $f: \mathcal{X} \to \mathbb{R}^d$, and $g: \mathcal{Y} \to \mathbb{R}$, where $\mathcal{Y}$ is the support of $Y$.

Context: This question is based on Assumption (A6) in page 11 of this paper. I am not well-versed in measure theory, so forgive me for incorrect use of notation.

Thoughts: My rough interpretation is that $S(x)$ denotes the sample paths of $Y \mid X = x$ of probability zero over which the LLN-type equality does not hold. Generally, this set can depend on $x \in \mathcal{X}$, and the question is whether there exists a set (independent of $x$) $T \supset S(x)$, for a.e. $x \in \mathcal{X}$ also of zero probability for which the equality holds.

Clearly, this holds when $f \equiv 0$ (i.e., $Y$ is independent of $X$) since the set $S$ does not depend on $x$ in this case. When $f$ is not trivially zero, it seems like there cannot be (uncountably) many values that the set $S(x)$ can take, because the conditional distributions $Y \mid X = x_1$ and $Y \mid X = x_2$ only differ by a translation when $x_1 \neq x_2$.

Plausible argument for the case when $f$ is continuous and $g$ is Lipschitz continuous: Let $\bar{\mathcal{X}} = \mathcal{X} \cap \mathbb{Q}^m$ be the intersection of the support of $X$ with $m$-dimensional rational vectors. Then $T = \cup_{x \in \bar{\mathcal{X}}} S(x)$ satisfies (1). I think this is true because if $\omega \in S(x)$ for some $x \in \mathcal{X} \backslash \bar{\mathcal{X}}$, then we can pick $\bar{x} \in \bar{\mathcal{X}}$ that is arbitrarily close to $x$ (since $\mathbb{Q}^m$ is dense in $\mathbb{R}^m$) such that $\omega \in S(\bar{x})$.

1

There are 1 best solutions below

0
On BEST ANSWER

The answer is "yes", assuming $g$ is continuous.

There are two extreme kinds of $\epsilon$. One kind is discrete $\epsilon$ (only atoms), and another is smooth $\epsilon$ (no atoms). I will deal with each, and let you handle mixtures.

Let's first do discrete. The limit condition is $\lim_{n \to \infty} \frac{1}{n}\sum_{i=1}^n g(f(x)+\epsilon_i) = \sum_{\epsilon'} p(\epsilon')g(f(x)+\epsilon')$, where $(\epsilon_i)_i$ is the sampled $\epsilon$'s and $p(\epsilon')$ is the probability of $\epsilon$ being a specific $\epsilon'$. We may let $T$ be the set of all $\omega$, equivalently the set of all $(\epsilon_i)_i$, such that there is some $\epsilon'$ with $\lim_{n \to \infty} \frac{1}{n}\#\{1 \le i \le n : \epsilon_i = \epsilon'\} \not = p(\epsilon')$ for each $\epsilon'$. It's easy to see $P_x[T] = 0$ for each $x$ (there's no dependence on $x$; $\epsilon$ is an independent thing, so by LLN, we have $\lim_{n \to \infty} \frac{1}{n}\#\{1 \le i \le n : \epsilon_i = \epsilon'\} = p(\epsilon')$ for each $\epsilon'$ with probability $1$).

Now let's do continuous. Let's suppose $\epsilon$ is supported in $0,1$, and $\mu$ is the distribution of $\epsilon$. We want $\lim_{n \to \infty} \frac{1}{n}\sum_{i=1}^n g(f(x)+\epsilon_i) = \int_0^1 g(f(x)+\epsilon')d\mu(\epsilon')$. We let $T$ be the set of all $(\epsilon_i)_i$ for which there is some subinterval $(a,b) \subseteq (0,1)$ with $\lim_{n \to \infty} \frac{1}{n}\#\{1 \le i \le n : \epsilon_i \in (a,b)\} \not \to \mu((a,b))$. Once again, $P_x[T]$ is independent of $x$, and it is $0$ by LLN. To see that $\lim_{n \to \infty} \frac{1}{n}\sum_{i=1}^n g(f(x)+\epsilon_i) = \int_0^1 g(f(x)+\epsilon')d\mu(\epsilon')$ for each $(\epsilon_i)_i$ not in $T$, we use continuity of $g$ (this is an easy analysis argument; let me know if you want me to sketch it out).