Asymptotics of M-estimators

123 Views Asked by At

I'm struggling immensely with the following exercise:

Consider a probability space $(\Omega,\mathcal{A},\mathbb{P})$. Define a sequence of random, continuous functions $\Psi_n:\Theta\to\mathbb{R}^k$ and $\Psi:\Theta\to\mathbb{R}^k$ a deterministic, continuous function.

a) Assume

  1. sup$_{\theta\in\Theta}||\Psi_n(\theta)-\Psi(\theta)||\to 0$ in probability
  2. $\Psi$ has the unique root $\theta_0$
  3. $\Theta$ is compact and $\theta_0\in\text{Int }\Theta$

Show that $\hat{\theta_n}=\text{argmin}_{\theta\in\Theta}||\Psi_n(\theta)||$ converges to $\theta_0$ in probability.

b) Assume now additionaly that $\Psi_n$ has the derivative $D\Psi_n$. Assume further that $\Psi$ satisfies:

  1. $\sqrt{n}\Psi_n(\hat{\theta_n})\to 0$ in probability and $\sqrt{n}\Psi_n(\theta_0)\overset{D}\to N(0,I(\theta_0))$.
  2. For any sequence $\theta^*_n\overset{\mathbb{P}}\to\theta_0$ and an invertible matrix $V(\theta_0)$ its derivative fulfills $D\Psi_n(\theta^*_n)\overset{\mathbb{P}}\to V(\theta_0)$.

Proof that $\sqrt{n}(\hat{\theta}_n-\theta_0)\overset{D}{\to}N(0,V(\theta_0)^{-1}I(\theta_0)^{-1}[V(\theta_0)^{-1}]^T)$

If I'm being honest I don't even know where to start.

Edit: I'd also be very interested in recommendations for textbooks dealing with asymptotic statistics.

1

There are 1 best solutions below

0
On

This is kind of handwavey, but I think I have a valid proof for the first question. I'll try and work on the second part later. Also, for textbook suggestions, I think that the Van der Vaart book is the gold standard:

https://www.amazon.com/Asymptotic-Statistics-Statistical-Probabilistic-Mathematics/dp/0521784506/ref=sr_1_1?ie=UTF8&qid=1509999605&sr=8-1&keywords=asymptotic+statistics

A.

You have that $\sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta)||$ can be replaced with:

$\sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta_0) + \Psi(\theta_0) - \Psi(\theta)||$

Then you can use the reverse version of the triangle inequality ($||a - b|| \geq |||a|| - ||b|||$), to obtain:

$\sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta_0) + \Psi(\theta_0) - \Psi(\theta)|| \geq \sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta_0)|| -|| \Psi(\theta) - \Psi(\theta_0)||$.

Changing the supremum to an infimum and switching signs: $\sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta_0)|| -|| \Psi(\theta) - \Psi(\theta_0)|| = \inf_{\theta \in \Theta} || \Psi(\theta) - \Psi(\theta_0)|| - ||\Psi_n(\theta) - \Psi(\theta_0)||$.

We can then upper bound it using the property that $\inf_x (f(x)+g(x)) \geq \inf_x f(x) + \inf_x g(x)$:

$\inf_{\theta \in \Theta} || \Psi(\theta) - \Psi(\theta_0)|| - ||\Psi_n(\theta) - \Psi(\theta_0)|| \geq \inf_{\theta \in \Theta} || \Psi(\theta) - \Psi(\theta_0)|| + \inf_{\theta \in \Theta} (-||\Psi_n(\theta) - \Psi(\theta_0)||)$.

Now noting that $\inf_{\theta \in \Theta} || \Psi(\theta) - \Psi(\theta_0)||$ occurs at $\theta = \theta_0$ and has a value of $0$ because the norm is positive semi-definite. Removing that term and flipping the sign and reversing back to a supremum, and we reach the following inequality:

$\sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta)|| \geq \sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta_0)|| =\sup_{\theta \in \Theta} ||\Psi_n(\theta)|| $

Which implies that $\sup_{\theta \in \Theta} ||\Psi_n(\theta) - \Psi(\theta_0)|| \rightarrow 0$ in probability.

Because the supremum upper bounds all other values as a function of $\theta$, and because the norm is positive semi-definite, we can replace the supremum in the previous expression with any value of $\theta$, specifically defining $\hat{\theta}$ as:

$\hat{\theta} = \textrm{argmin}_{\theta}||\Psi_n(\theta) - \Psi(\theta_0)||$

we have:

$||\Psi_n(\hat{\theta}) - \Psi(\theta_0)|| = ||\Psi_n(\hat{\theta})||\overset{P}{\rightarrow} 0 $.

To complete the first proof we need to show that the previous expression implies $\hat{\theta}\overset{P}{\rightarrow} \theta_0$. To do this we note that assumption 1 implies that $\Psi_n(\theta) \overset{P_{L_1}}{\rightarrow} \Psi(\theta)$, or , $\forall \epsilon > 0$, $\lim_{n\rightarrow \infty} P[\sup_{\theta}||\Psi_n(\theta) - \Psi(\theta)||>\epsilon] = 0$.

This implies that for sufficiently large $n$, $\Psi_n(\hat{\theta}) = \Psi(\hat{\theta}) + o_P(f(n))$ for some decreasing function of $n$. This gives us:

$||\Psi_n(\hat{\theta})|| = ||\Psi(\hat{\theta}) + o_P(f(n))|| \rightarrow 0 $. Which implies that $\Psi(\hat{\theta}) + o_P(f(n)) \rightarrow 0$.

Noting that $f(n)$ is a decreasing function and that $\theta_0$ is the unique root to $\Psi$, this implies that $\hat{\theta} \overset{P}{\rightarrow} \theta_0$.