Referencing the site below:
http://gregorygundersen.com/blog/2019/11/28/asymptotic-normality-mle/
We have the asymptotic normality property of maximum likelihood estimators:
Theorem: Assuming sufficient regularity, we have: $$\sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{\mathcal{D}} \mathcal{N}(0,\mathcal{I}(\theta_0)^{-1})$$
On the next line, the site claims that this property implies:
Corollary: $$\hat{\theta} \xrightarrow{\mathcal{D}} \mathcal{N}(\theta_0,\mathcal{I}(\theta_0)^{-1})$$
I have two questions:
- Why is the $\sqrt{n}$ factor allowed to be dropped? What is the formal reason for this?
- I just want to make sure, is it correct that the corollary is equivalent to $(\hat{\theta}-\theta_0) \xrightarrow{\mathcal{D}} \mathcal{N}(0,\mathcal{I}(\theta_0)^{-1})$?
You are missing a subscript in the second; in the corollary, it should be $\mathcal{I}_n(\theta_0)^{-1}$.
The $n$ is important: as mentioned in the blog post you link to, $\mathcal{I}(\theta_0)$ is the Fisher information for a single $X_i$, while $\mathcal{I}_n(\theta_0) = n \mathcal{I}(\theta_0)$ is the Fisher information for the full sample $X=(X_1,\dots,X_n)$.
So things are indeed consistent: the $\sqrt{n}$ was just "swallowed" into the factor $n$ in the variance.