Robustness of estimators

Question

Robustness of estimators

70 Views Asked by Bumbble Comm At 22 Feb 2026 - 7:29

I have a question regarding the robustness of estimators. I have 4 estimators and I have been asked to consider which estimator is most robust to mis-specification. What is mis-specification? What are the things that I have to consider here?

I know that an estimator is robust if it doesn't rely on all of that data and it doesn't change as much when we add very large or very small values. Is this correct?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Roughly speaking, 'robustness' means that an an estimate or procedure still gives useful answers even if the data are not sampled from exactly the intended distribution. The most common case may be that an estimate that is exactly correct for strictly normal data, is not far wrong if normal data are 'contaminated' by some unanticipated values, often 'outliers'. Terminology may differ slightly from author to author and application to application, so I won't try to formally define 'mis-specification' or to quantify exactly it means to 'change much'. Look at your text to clarify those things precisely.

Here is an example: Suppose I have 1000 observations from $\mathsf{Norm}(\mu = 100, \sigma=15).$ Then the sample mean $A$, the 5% trimmed mean $T$, and the median $H$ are all unbiased estimates of $\mu.$ That is $E(A) = E(T) = E(H) = \mu.$ However, if the data are honestly from $\mathsf{Norm}(\mu = 100, \sigma=15),$ the $Var(A) < Var(T) < Var(H),$ so that estimator $A$ is preferred.

set.seed(324)
n = 1000;  mu = 100;  sg = 15
m = 10^5;  a = t = h = numeric(m)
for (i in 1:m) {
  x = rnorm(n, mu, sg)
  a[i] = mean(x);  t[i] = mean(x, tr=.05);  h[i] = median(x)  }
mean(a);  mean(t);  mean(h)
## 99.9984    # aprx expected values all very near 100
## 99.9986
## 99.99728
sd(a);  sd(t);  sd(h)
## 0.4763558    # aprs SDs smallest for mean, largest for median
## 0.4826503
## 0.595646

However if 2% of the observations (20 in 1000) are high outliers, then the trimmed mean and the median are more robust estimators. For simplicity I model the contamination as the values 501 through 520 for each sample. (In reality the contimaination would fluctuate.)

 set.seed(325)
 n = 1000;  mu = 100;  sg = 15
 m = 10^5;  a = t = h = numeric(m)
 for (i in 1:m) {
   x = c(rnorm(n-20, mu, sg), 501:520)
   a[i] = mean(x);  t[i] = mean(x, tr=.05);  h[i] = median(x)  }
 mean(a);  mean(t);  mean(h)
 ## 108.2092
 ## 100.5807
 ## 100.3821
   sd(a);  sd(t);  sd(h)
 ## 0.4699302
 ## 0.4845178
 ## 0.6002898

The simulation suggests that the trimmed mean is a suitable robust estimator in this situation: $E(A) \approx 108.2 > \mu = 100,$ but $E(T) \approx 100.5$ is much closer to the population mean, and $E(H)$ is even a little closer. However, $SD(T) \approx 0.48$ is not much larger than $SD(A),$ whereas $SD(H)$ is very much larger.

Different authors have various criteria for choosing among robust estimators, but in this very simple example it seems reasonable to use the trimmed mean.

Robustness of estimators

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in ROBUST-STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions