I am currently learning about robust methods for comparing means, and read about the Bootstrap-t-test and its implementation in R. However, I found that this test tends to give results similar to the classical t-test, even when the assumptions of the t-test are violated (e.g. skewed distributions) and those of the Bootstrap-t-test are not.
My question is: Are there any (classical?) examples which illustrate cases where a t-Test would be inappropriate and a robust Bootstrap-t-test should be applied instead?
I confess there are three things I don't understand about your question: (a) There are many styles of bootstrap CI based on the idea of a t test, and I have no idea which one you are using. (b) Usually, bootstraps procedures are used to make confidence intervals, not to do tests. (c) I don't know what you mean by a 'classical' case where a t test would be inappropriate. However, by making reasonable guesses, I hope I can give a close enough answer to your Question to be helpful.
A problematic dataset. One case in which most practicing statisticians would feel uncomfortable using a t confidence interval would be for a sample of modest size (I'll use $n = 25$) which has obvious far outliers. I use R to generate such a dataset:
The 95% confidence interval from the
t.testprocedure in R is $(91.75, 113.28).$ It is not unusual for normal samples to show occasional boxplot outliers, but the two 'simulated' outliers at 160 and 190 are extreme and cause doubt that the data are truly a random sample from a normal distribution.A nonparametric bootstrap. Suppose we knew the distribution of $D = \bar X - \mu.$ Then we could use that distribution to find bounds $L$ and $U$ with $$P(L < D < U) = P(L < \bar X - \mu < U) = P(\bar X - U < \mu < \bar X - L) = 0.95,$$ so that a 95% CI for $\mu$ would be of the style $(\bar X - U. \bar X - L).$
Not knowing the distribution of $D,$ we enter the 'bootstrap world' to get approximate values $L^*$ and $U^*$ of $L$ and $U,$ respectively. We take a large number $B$ of 're-samples' of size $n$ with replacement from the data
x, find $\bar X^*$ and thus $D^* = \bar X^* - \bar X$ for each re-sample, where (temporarily) the observed $\bar X$ from the original data is used as a proxy for the unknown population mean $\mu.$ Then taking quantiles .025 and .975 from the bootstrap distribution of the $D^*$s, we get the desired estimates $L^*$ and $U^*.$Back in the real world, we obtain the 95% nonparametric bootstrap CI for $\mu:$ $(\bar X - U^*,\, \bar X - L^*),$ in which $\bar X$ now returns to its original role as the average of the original data.
In R, this version of the nonparametric bootstrap goes as follows (we use suffixes
.reinstead of*s in the program):Thus the bootstrap CI is $(94.0, 121.0)$. [Because this is a computation via simulation, exact results can vary slightly from one bootstrap run to the next. Another run with a different seed gave $(94.1, 121.0).]$
Whether this CI is 'substantially the same' as the t confidence interval obtained earlier is in the eye of the beholder. It is perhaps due to the legendary robustness of t procedures that the difference in not even greater. However, in view of the extreme outliers in the dataset, many practicing statisticians would feel more comfortable using the bootstrap CI.
A histogram of the bootstrap distribution of $D$ is shown below, with vertical red lines at quantiles .025 and .975. Notice that these points are not symmetrically located about 0, as would be the case where normality is assumed.
Notes: (a) The use of $D$ instead of a "$T$ statistic" (of unknown distribution) does not make a substantial difference in the result for the bootstrap CI, but it is easier to program and some people feel it is easier to understand the rationale for it. (b) I have shown seeds for generating the original data and for the bootstrap procedure. So you can replicate the same dataset if you want to try the version of the bootstrap in your text or class notes. (c) The bootstrap CI will generally tend to be somewhat longer than the t confidence interval: The assumption of normality provides information to the latter procedure that is not assumed in the former.