What can be said about the distribution of estimators obtained by using bootstrapping?

135 Views Asked by At

A common technique to estimate the uncertainty—for example the variance—in an estimate α (this could the the mean, for example) produced by some estimator applied to a small dataset with $n$ examples, is to use bootstrapping, as follows: sample $n$ examples with replacement from the dataset and estimate α on the bootstrapped dataset (the $n$ samples), repeat this $N$ times to get estimates $α_1$ to $α_N$, then you estimate the uncertainty among those estimates—for example, in the case of estimating the variance, you could calculate the unbiased sample variance (by dividing by $N-1$ instead of by $N$ when calculating the variance).

However, it is now clear to me exactly what the information you get by doing this kind of bootstrapping tells you. For example, is there some intuitive and easy to understand value that the uncertainty estimate approaches when $N$ goes to infinity? Will the $N$ estimates that bootstrapping gives you be approximately distributed according to some easy to understand distribution, for example the distribution$^*$ who's probability density function (as a function of $α$) is the normalized likelihood (normalized so that its integral is 1) of the dataset given the specific value of $α$, assuming that the dataset can be modeled as being distributed according to some probability distribution parameterized by $α$? Can any of this be proven? (I hope what I wrote here made sense.)

$^*$Does this distribution have a name?

1

There are 1 best solutions below

17
On

Fundamentally, (nonparametric) bootstrapping relies on the closeness of the empirical CDF for a sample of size $n$ (i.e., $F_n$) to the true CDF $F$. In general, as $n\to \infty$, $F_n \xrightarrow{a.s.} F$. This comes from the Glivenko-Cantelli theorem and the related DKW inequality.

More precisely, the estimator can be interpreted as a functional (e.g., $F_n \to \mathbb{R}$) where we are relying on the closeness of $\alpha(F_n)$ to $\alpha(F)$ so that sampling from $F_n$ will give good estimates of $\alpha(F)$.

So the type of information you are getting is an estimate of an estimate (second order estimate?). First you are estimating $F$ then you are sampling from this to estimate $\alpha$. If all goes well, then you have a reasonably good approximation for the variability of $\alpha$ with a sample size of $n$ from $F$.

NOTE: There are technical issues to rule out, but the above is the gist of what you are doing and getting with the bootstrap.

EDIT: In response to comments

The OP has asked

"Is there a way to prove that this variability is close to the variability [variability of the sampling distribution under the true distribution] of $\alpha$ calculated on the bootstrapped datasets?

In general, the answer is no -- no such proof exists. Appealing to bootstrapping is analogous to appealing to the CLT for sample means -- we expect it to work in most cases, but we cannot prove our sample mean will be close to the population mean. We can prove some nice convergence results with (usually mild) conditions that we expect to hold in practice, but as with all asymptotic results, their use in the "real world" of finite $n$ is more as a guide than as a solid proof that you will be right.

There is an extended discussion on this on CrossValidated, with contributions from some of the "heavy hitters" on that site (most of which are practicing research statisticians). The reason I am bringing up Glivenko-Cantelli and DKW is that they do address your question as to why we think boostrapping is a good idea, and also provide intuition as to "how well". In particular, see comments to the accepted answer, especially:

https://stats.stackexchange.com/questions/26088/explaining-to-laypeople-why-bootstrapping-works#comment47941_26093

As to my use of $\alpha(F_n)$ this is a standard notation to describe a functional of the data. StasK's answer answer to that question is this one: https://stats.stackexchange.com/a/28080/233429 -- I will not replicate that answer here. I highly recommend reading the first 5 pages of this paper by Efron and Tibshirani (i.e., THE authorities on bootstrapping) that uses the same formulations and arguments as I gave above.

The emergence of the bootstrap has led to a huge amount of reserch (both theoretical and applied) in where and how it works and doesn't work. A large reason for this is that the plug-in principle on which the bootstrap is based relies upon asymptotic results as well as some of those (in)famous "regularity conditions". It is not a general-purpose silver bullet for all estimation/inference problems.

Michael Chernick's answer sums it up nicely:

There is no reason to be puzzled about the bootstrap anymore. It is important to keep in mind that the bootstrap depends on the bootstrap principle "Sampling with replacement behaves on the original sample the way the original sample behaves on a population. There are examples where this principle fails. It is important to know that the bootstrap is not the answer to every statistical problem.