I have a question regarding the robustness of estimators. I have 4 estimators and I have been asked to consider which estimator is most robust to mis-specification. What is mis-specification? What are the things that I have to consider here?
I know that an estimator is robust if it doesn't rely on all of that data and it doesn't change as much when we add very large or very small values. Is this correct?
Roughly speaking, 'robustness' means that an an estimate or procedure still gives useful answers even if the data are not sampled from exactly the intended distribution. The most common case may be that an estimate that is exactly correct for strictly normal data, is not far wrong if normal data are 'contaminated' by some unanticipated values, often 'outliers'. Terminology may differ slightly from author to author and application to application, so I won't try to formally define 'mis-specification' or to quantify exactly it means to 'change much'. Look at your text to clarify those things precisely.
Here is an example: Suppose I have 1000 observations from $\mathsf{Norm}(\mu = 100, \sigma=15).$ Then the sample mean $A$, the 5% trimmed mean $T$, and the median $H$ are all unbiased estimates of $\mu.$ That is $E(A) = E(T) = E(H) = \mu.$ However, if the data are honestly from $\mathsf{Norm}(\mu = 100, \sigma=15),$ the $Var(A) < Var(T) < Var(H),$ so that estimator $A$ is preferred.
However if 2% of the observations (20 in 1000) are high outliers, then the trimmed mean and the median are more robust estimators. For simplicity I model the contamination as the values 501 through 520 for each sample. (In reality the contimaination would fluctuate.)
The simulation suggests that the trimmed mean is a suitable robust estimator in this situation: $E(A) \approx 108.2 > \mu = 100,$ but $E(T) \approx 100.5$ is much closer to the population mean, and $E(H)$ is even a little closer. However, $SD(T) \approx 0.48$ is not much larger than $SD(A),$ whereas $SD(H)$ is very much larger.
Different authors have various criteria for choosing among robust estimators, but in this very simple example it seems reasonable to use the trimmed mean.