The measure-theoretical definition of a bootstrap sample

2.1k Views Asked by At

I’m currently learning the bootstrap method, and I have two questions to ask about the definition of a bootstrap sample.


Let $ (\Omega,\mathscr{S},\mathsf{P}) $ be a probability space. Let $ X_{1},\ldots,X_{n} $ be i.i.d. random variables on $ (\Omega,\mathscr{S},\mathsf{P}) $, with their common c.d.f. denoted by $ F $. Let $ \hat{F} $ denote the empirical c.d.f. of $ X_{1},\ldots,X_{n} $, i.e., $$ \forall x \in \mathbf{R}: \qquad \hat{F}(x) = \frac{1}{n} \sum_{i = 1}^{n} \chi_{(- \infty,x]} \circ X_{i}. $$ Clearly, $ \hat{F}(x) $ is a random variable on $ (\Omega,\mathscr{S},\mathsf{P}) $ for each $ x \in \mathbf{R} $, and for each $ \omega \in \Omega $, the function $$ \left\{ \begin{matrix} \mathbf{R} & \to & [0,1] \\ x & \mapsto & \left[ \hat{F}(x) \right] \! (\omega)\end{matrix} \right\} $$ is the c.d.f. of some discrete random variable.

Question 1: What does it mean to say that $ (X_{1}^{*},\ldots,X_{n}^{*}) $ is a bootstrap sample drawn from $ \hat{F} $? As mentioned, $ \hat{F}(x) $ is not a number but a random variable for each $ x \in \mathbf{R} $. I require an answer to this question strictly in terms of measure theory.

Question 2: What probability space are $ X_{1}^{*},\ldots,X_{n}^{*} $ defined on? Is it still $ (\Omega,\mathscr{S},\mathsf{P}) $?

Thanks!

3

There are 3 best solutions below

0
On BEST ANSWER

I’ve managed to answer my questions. In what follows, we fix $ n \in \mathbf{N} $ and denote $ \mathbf{N}_{\leq n} $ by $ [n] $.


Let $ \mathcal{R} $ denote the set of random variables defined on the probability space $ ([n]^{n},\mathcal{P}([n]^{n}),\mathsf{c}) $, where $ \mathsf{c} $ denotes the probability measure on $ ([n]^{n},\mathcal{P}([n]^{n})) $ having a mass of $ \dfrac{1}{n^{n}} $ at every element of $ [n]^{n} $. Then $ X_{1}^{\ast},\ldots,X_{n}^{\ast} $ are $ \mathcal{R} $-valued functions on $ \Omega $ such that for any $ i \in [n] $ and $ \omega \in \Omega $, the following conditions hold:

  • $ {X_{i}^{\ast}}(\omega): [n]^{n} \to \{ {X_{k}}(\omega) \}_{k \in [n]} $.
  • $ [{X_{i}^{\ast}}(\omega)](\mathbf{a}) = {X_{\mathbf{a}(i)}}(\omega) $ for each $ \mathbf{a} \in [n]^{n} $.

One can easily verify that for each $ \omega \in \Omega $, the c.d.f. of $ {X_{i}^{\ast}}(\omega) $ for any $ i \in [n] $ is precisely $ \left[ \hat{F}(\cdot) \right] \! (\omega) $.


If $ \displaystyle \bar{X} \stackrel{\text{df}}{=} \frac{1}{n} \sum_{i = 1}^{n} X_{i} $ and $ \displaystyle \bar{X}^{\ast} \stackrel{\text{df}}{=} \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{\ast} $, then $ \bar{X}^{\ast} - \bar{X} $ is to be interpreted as an $ \mathcal{R} $-valued function on $ \Omega $, i.e., $$ \forall \omega \in \Omega: \qquad \left( \bar{X}^{\ast} - \bar{X} \right) \! (\omega) = \frac{1}{n} \sum_{i = 1}^{n} {X_{i}^{\ast}}(\omega) - \underbrace{\frac{1}{n} \sum_{i = 1}^{n} {X_{i}}(\omega)}_{(\star)}, $$ where the term $ (\star) $ is viewed as a constant random variable on $ ([n]^{n},\mathcal{P}([n]^{n}),\mathsf{c}) $.

2
On

Question 1: By bootstrap sample we mean a sample, which we get by choosing $n$ times with replacement from $X_1,..., X_n$.

$\forall k\in \{1,2,..., n\}$

\begin{equation} X^{*}_{k} : \left(\begin{array}{cccc} X_1 & X_2 & ... & X_n\\ \frac{1}{n} & \frac{1}{n} & ... & \frac{1}{n}\end{array}\right) \end{equation}

Note that it can happen that any of this $X_i$ and $X_j$ in their outcomes could have same numbers (if we have discrete r.v. in the start).

Question 2: We should see that $X:\Omega\to \mathbb{R}$, $X^{*}:\Omega_1\to \{x_1,..., x_n\}$, because:

for each $\omega \in \Omega$ and $\forall i\in \{1,..., n\}$ we get $X_i(w)=x_i$, then: \begin{equation} X^{*}_{i} : \left(\begin{array}{cccc} x_1 & x_2 & ... & x_n\\ \frac{1}{n} & \frac{1}{n} & ... & \frac{1}{n}\end{array}\right) \end{equation} Hence the $\Omega$ is here for getting our $X_i^*$ values, or precisely it "helps" $X_i^*$ to become exact random variables. Then we need some new set $\Omega_1$ for getting outcomes from $X_i^*.$ Hence sigma-algebra is also new one, just as the probability.

Some notes: again, what is bootstrap really doing. First we get sample $X_1,..., X_n$ from our original set, then we are sampling this sample, which we call resampling.

Hope it will help.

0
On

Here is a complementary (but equivalent) construction of the bootstrap sample:

Let $(\Omega_1,\Sigma_1,P_1)$ be a probability space and consider the $n$-fold product space $(\Omega^S,\Sigma^S,P^S)$ with coordinate maps $X_1,\ldots,X_n$. In other words, $X_1,\ldots,X_n$ is the canonical choice of of i.i.d. random variables with law $P_1$, and $(\Omega^S,\Sigma^S,P^S)$ is the space of our original sample.

Next, consider the probability space $(\Omega_2,\Sigma_2,P_2)$, where $\Omega_2=\{1,\ldots,n\}$ is the $n$-point set, $\Sigma_2$ is the power set $\sigma$-algebra and $P_2$ is the uniform measure. Let $(\Omega^B,\Sigma^B,P^B)$ denote the $n$-fold product of this space with coordinate maps $\tau_1,\ldots,\tau_n$. Thus, $\tau_1,\ldots,\tau_n$ are i.i.d. uniformly distributed on $\{1,\ldots,n\}$.

Finally, let $(\Omega,\Sigma,P)$ denote the product of $(\Omega_1,\Sigma_1,P_1)$ and $(\Omega_2,\Sigma_2,P_2)$. The variables $X_1,\ldots,X_n$ can be viewed as variables in $(\Omega,\Sigma,P)$ by putting $X_j(\omega):=X_j(\omega^S)$ for $\omega=(\omega^S,\omega^B)$, and similarly for $\tau_1,\ldots,\tau_n$.

Definition: The bootstrap sample $X^*_1,\ldots,X^*_n$ of $X_1,\ldots,X_n$ is defined by $X^*_j:=X_{\tau_j}$.

That is, $X^*_j(\omega)=X_{\tau_j(\omega)}(\omega)=X_{\tau_j(\omega^B)}(\omega^S)$, or equivalently $X^*_j=\sum_{k=1}^n X_k1_{(\tau_j=k)}$.

Since $P(\tau_j=k)=1/n$ and $X_1,\ldots,X_n,\tau_1,\ldots,\tau_n$ are independent, we see that almost surely $$ E(1_A(X^*_j)\mid X_1,\ldots,X_n)=\frac{1}{n}\sum_{k=1}^n E(1_A(X_k)\mid X_1,\ldots,X_n) = \frac{1}{n}\sum_{k=1}^n 1_A(X_k) = \frac{1}{n}\sum_{k=1}^n \delta_{X_k}(A). $$ This ensures, as expected, that the empirical measure $P_n(A,\omega):=\frac{1}{n}\sum_{j=1}^n \delta_{X_j(\omega)}(A)$ is a conditional distribution for $X_j^*$ given $X_1,\ldots,X_n$. Similarly, we have almost surely \begin{align} &E(1_{A_1}(X^*_1)\cdots 1_{A_n}(X^*_n)\mid X_1,\ldots,X_n) \\ &=\frac{1}{n^n}\sum_{k_1=1}^n\cdots \sum_{k_n=1}^n 1_{A_1}(X_{k_1})\cdots 1_{A_n}(X_{k_n})\\ &=E(1_{A_1}(X^*_1)\mid X_1,\ldots,X_n)\cdots E(1_{A_n}(X^*_n)\mid X_1,\ldots,X_n), \end{align} which ensures that $X_1^*,\ldots,X_n^*$ are conditionally independent given $X_1,\ldots,X_n$.

Dudley, R.M., Real analysis and probability., Cambridge Studies in Advanced Mathematics. 74. Cambridge: Cambridge University Press. x, 555 p. (2002). ZBL1023.60001.

Dudley, R. M., Uniform central limit theorems, Cambridge Studies in Advanced Mathematics 142. Cambridge: Cambridge University Press (ISBN 978-0-521-73841-5/pbk; 978-0-521-49884-5/hbk; 978-1-139-01483-0/ebook). 482 p. (2014). ZBL1317.60030.