Does convergence in probability not imply convergence in distribution for Least Squares estimators?

828 Views Asked by At

I have a question relating to convergence in probability and distribution for least squares estimators. Frequently, I see in textbooks that $\hat{\beta} \rightarrow^p b$. Where $b$ is the population parameter, and $\hat{\beta}$ is the Least Squares estimator of that parameter; demonstrating that LS estimators are consistent.

I also often see that, $\hat{\beta} \rightarrow^d N(b,\frac{1}{N}(X'X)^{-1})$, showing that $\hat{\beta}$ tends in distribution to a Normal centered around $b$.

I just wanted to check that my thinking here is correct. Convergence in probability always implies convergence in distribution as far as I understand it. Hence convergence in probability of LS estimators to a constant $b$ should imply convergence in distribution to the same constant $b$. Hence we should have that $\hat{\beta} \rightarrow^d b$. Is this satisfied above with $N(b,\frac{1}{N}(X'X)^{-1})$ since the variance tends to zero (since $N\rightarrow \infty$), meaning the distribution is itself that of a constant? Or is there some other reason that means that LS estimators converge in distribution to a constant?

Many thanks,

Ben

2

There are 2 best solutions below

5
On BEST ANSWER

Many thanks for your response. You were correct I actually meant the asymptotic distribution of $\hat{\beta}$ is given by:

$\hat{\beta} \rightarrow^d N(b,\sigma^2 (1/n)Q^{-1})$

Where $\left[(1/n)(X'X)\right]^{-1}\rightarrow^p Q^{-1}$ as in the following notes: http://www3.grips.ac.jp/~yamanota/Lecture_Note_6_Asymptotic_Properties.pdf

I understand your reasoning, so am I correct in thinking that the above (distributional convergence) implies that because as $n \rightarrow \infty$, that really:

$\hat{\beta} \rightarrow^d b$

Since the variance goes to zero. Is this logic not correct?

However, as I understand it we can magnify the difference between $\hat{\beta}$ and b, by multiplying the difference between them by $n^{1/2}$, and it turns out that this only just balances the rate at which their difference goes to zero. Meaning this difference tends to a finite distribution:

$n^{1/2}(\hat{\beta}-b) \rightarrow^d N(0,\sigma^2 Q^{-1})$

I have done some research and it is often stated that:

$\hat{\beta} \rightarrow^a N(b,\sigma^2 (1/n) Q^{-1})$

Am I right in thinking that the above stands only for moderate n, since when n gets large the variance tends to zero? Does the 'a' stand for asymptotic distributed in the above?

Many many thanks (again),

Ben

1
On

You fell victim to a widely adopted verbal & even notational convention that, strictly speaking, is erroneous, but people involved in statistics and econometrics are supposed to know what it truly means. The specified model and the OLS estimator are (in vector matrix notation) $$ Y = XB + U\qquad \hat B = (X'X)^{-1}X'Y$$ $$\Rightarrow \hat B = (X'X)^{-1}X'(XB + U) = (X'X)^{-1}X'XB+ (X'X)^{-1}X'U = B + (X'X)^{-1}X'U $$ To establish weak consistency (i.e. convergence in probability) we examine $$\text{plim}(\hat B-B) = plim\left((X'X)^{-1}X'U\right) = plim\left(\left(\frac 1nX'X\right)^{-1}\left(\frac 1nX'U\right)\right) $$ Given the usual assumptions of the model, the law of large numbers lead us to obtain $\text{plim}(\hat B-B) =0$ and we write $\hat B \rightarrow^p B$. Now, if $\hat B$ converges to a constant, how on earth can it converge also to a random variable and have a distribution? It cannot. But, some other quantity can. What quantity? The quantity (multiply throughout by $\sqrt n$) $$\text{plim}\sqrt n\left(\hat B-B\right) = plim\left(\left(\frac 1nX'X\right)^{-1}\left(\frac {1}{\sqrt n}X'U\right)\right)$$ Using the Central Limit Theorem this time, this quantity converges to a distribution, because it "goes to infinity at a rate $\sqrt n$" which is "slower" than $n$ : $$Z= \sqrt n\left(\hat B-B\right) \rightarrow^d N\left(0, \sigma^2\left(E[X'X]\right)^{-1}\right)$$ Now $$Z= \sqrt n\left(\hat B-B\right) \Rightarrow \hat B = \frac{1}{\sqrt n}Z + B$$ Since $Z$ is asymptotically a normal random variable and since $\hat B$ is a function of this normal random variable it will have also a normal distribution with $$E(\hat B) = E\left(\frac{1}{\sqrt n}Z + B\right) = 0 + B = B,\qquad Var(\hat B) = \left(\frac{1}{\sqrt n}\right)^2Var(Z) = \frac 1n\sigma^2\left(E[X'X]\right)^{-1}$$ Using the approximation $\left(E[X'X]\right)^{-1}\approx \left(\frac 1n X'X\right)^{-1} = n\left(X'X\right)^{-1} $ we obtain $Var(\hat B) = \sigma^2\left(X'X\right)^{-1}$.
But the LS estimator will have this distribution, $\hat B \sim N\left(B, \sigma^2\left(X'X\right)^{-1}\right)$ only approximately, and only "for large $n$" - NOT if $n$ truly goes to infinity. Since in practice our samples are never truly infinite, we can use the above distribution to make approximate statistical inference. You may think of it informally as follows: before the LS estimator collapses to the true value, it has this distribution. But all this is customary to be abbreviated verbally or notationally to "the LS estimator will have the asymptotic distribution/will converge to the distribution... etc", confusing newcomers. (PS: Where did you find the asymptotic distribution of the LS estimator with variance $\frac{1}{N}(X'X)^{-1}$? I am interested).