Why is a bootstrap method helping in some way?

546 Views Asked by At

The general methods with bootstrapping is always similar to that:

We have a given sample $x_1,...,x_n$. Then we pick some elements of the sample randomly and put it then back to the sample; This creates some new samples...

But why is this actually helping? I can't see the sense behind it because we can't create any new information with such methods and our original sample doesn't get more accurate I guess;

So why are we doing it?

1

There are 1 best solutions below

0
On

Bootstrapping is a general technique, where one takes subsamples from a sample and uses it for some statistical task. You're right that no new information is created. It's more of an algorithmic technique for performing some statistical tasks.

A pretty common use in machine learning (e.g. in mini-batch $k$-means, or mini-batch stochastic gradient decent for deep neural networks) is to train the learner iteratively. This is for computational efficiency, since it is much cheaper to take many small steps in parameter space with subsamples than to use the full population, and it will probabilistically converge to the correct answer anyway. It's also useful for the "ensemble" learning approach called bagging and as a method for handling class imbalance (e.g. by duplicating the undersampled class's samples).

In statistics, one use is to estimate distributions of sample statistics. Suppose we have a sample $S=\{x_i\}_{i=1}^n$ with bootstrap subsamples $s_i$. One could easily compute the sample mean $\bar{x}$ of $S$. But what is the distribution of $\bar{x}$ as a random variable? One could compute the sample means $\bar{x}_i$ of each $s_i$ to answer this.

There are two good threads related to this on the stats SE: this one and this one.

Also, it is useful in semisupervised learning cases, where we want to do statistical learning on an unlabeled dataset with some tiny labeled seed set. For instance, in natural language processing, see Waegel's survey here. This is a little different from the methods mentioned above.