Asymptotic Normality of Weighted LSE (Theorem 3.17, Jun Shao)

Question

Asymptotic Normality of Weighted LSE (Theorem 3.17, Jun Shao)

169 Views Asked by Bumbble Comm At 09 Apr 2026 - 1:38

I am trying to understand Jun Shao's proof of the asymptotic normality of weighted LSE in his book Mathematical Statistics.

The theorem: Consider the model $X = Z\beta + \varepsilon$ with a full rank $Z$, and $\breve{\beta} = (Z^\tau V^{-1} Z)^{-1} Z^{\tau} V^{-1} X$ and $\hat{\beta}_w = (Z^\tau \hat{V}^{-1} Z)^{-1} Z^{\tau} \hat{V}^{-1} X$ with a consistent $\hat{V}$. Assume the conditions in Theorem 3.12 (attached). Then $$l^\tau(\hat{\beta}_w - \breve{\beta})/a_n \rightarrow_d N(0, 1)$$ where $l \in \mathcal{R}^p$, $l \neq 0$, and $$a^2_n = Var(l^\tau \breve{\beta}) = l^\tau(Z^\tau V^{-1} Z)^{-1}l.$$

Here are some definitions and assumptions involved:

The model is $X = Z\beta + \varepsilon$, where $X = (X_1, \cdots, X_n)$, $\varepsilon = (\varepsilon_1, \cdots, \varepsilon_n)$, and $Z$ be the $n \times p$ matrix of covariates. $\beta$ is unknown, $Z$ is non-random, and $\varepsilon$ is random error. $V = Var(\varepsilon)$ is unknown and $\hat{V}$ is an estimator of $V$. $\hat{V}$ is consistent for $V$ if and only if $||\hat{V}^{-1}V - I_n||_{max} \rightarrow_p 0$, where $||A||_{max} = \max_{i,j} \|a_{ij}\|$ for a matrix $A$. Let $A_n = V^{1/2}\hat{V}^{-1}V^{1/2} - I_n$ and $||A|| = \sqrt{tr(A^\tau A)}$.

Summary: The proof decomposes $l^\tau(\hat{\beta}_w - \breve{\beta})$ into two parts $\xi_n + \zeta_n$, where $\xi_n = l^\tau(Z^\tau \hat{V}^{-1} Z)^{-1} Z^{\tau} (\hat{V}^{-1} - V^{-1})\varepsilon$ and $\zeta_n = l^\tau[(Z^\tau \hat{V}^{-1} Z)^{-1} - (Z^\tau V^{-1} Z)^{-1}]Z^{\tau}V^{-1}\varepsilon$, then showed that the statement holds because both $\xi_n$ and $\zeta_n$ are $o_p(a_n)$.

In the derivation of the $\xi_n$ part (highlighted in blue), Jun Shao first showed that $$\xi_n = [l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon]^2,$$ which is fine. He then declared that this term is a covariance matrix and by the inequality $[Cov(X_i, X_j)]^2 \leq Var(X_i)Var(X_j), i\neq j$, this term is $\leq (\varepsilon V^{-1} \varepsilon^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n^2 V^{-1/2} Z (Z^\tau \hat{V}^{-1} Z)^{-1} l$.

My question: It is not very obvious what are the two variables involved in this covariance matrix. May you help me identify them?

Thank you very much in advance.

Screenshots of the involved theorems: Theorem 3.17 Theorem 3.12

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-05-06 10:30:39

It seems a bit weird to say that $\xi_n$ is a covariance matrix when it contains $\varepsilon$ which is a random vector. I think that a more straightforward way of getting the inequality is to use the scalar product $\langle A, B \rangle = \mbox{Tr}(A^\tau B)$ and the implied Cauchy-Scwhartz inequality $$\mbox{Tr}(A^\tau B)^2 \leq \mbox{Tr}(A^\tau A) \mbox{Tr}(B^\tau B).$$ Now, the quantity $l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon$ is one dimensional hence \begin{align*} l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon &= \mbox{Tr}(l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon) \end{align*} and we can apply the above Cauchy-Schwatz inequality with $A = l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n$ and $B=V^{-1/2} \varepsilon$. Morevoer, notice that $$\mbox{Tr}(A^\tau A)=\mbox{Tr}(AA^\tau)=l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n^2 V^{-1/2} Z (Z^\tau \hat{V}^{-1} Z)^{-1}l$$ and $$\mbox{Tr}(B^\tau B)=\varepsilon^\tau V^{-1} \varepsilon$$ where in both case the trace disappear since we get one-dimensional quantities. Finally, $$(l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon )^2 \leq \varepsilon^\tau V^{-1} \varepsilon l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n^2 V^{-1/2} Z (Z^\tau \hat{V}^{-1} Z)^{-1} l.$$

This is slightly different than the inequality you highlighted in blue in the book which starts with $\varepsilon V^{-1} \varepsilon^\tau$ and not $\varepsilon^\tau V^{-1} \varepsilon$. Although, the expression in the book doesn't seem to make sense to me because $\varepsilon$ is of dimension $n\times 1$ and $V=\mbox{Var}(\varepsilon)$ is of dimension $n \times n$.

Asymptotic Normality of Weighted LSE (Theorem 3.17, Jun Shao)

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in ASYMPTOTICS

Related Questions in COVARIANCE

Related Questions in LEAST-SQUARES

Related Questions in WEIGHTED-LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions