I am trying to understand Jun Shao's proof of the asymptotic normality of weighted LSE in his book Mathematical Statistics.
The theorem: Consider the model $X = Z\beta + \varepsilon$ with a full rank $Z$, and $\breve{\beta} = (Z^\tau V^{-1} Z)^{-1} Z^{\tau} V^{-1} X$ and $\hat{\beta}_w = (Z^\tau \hat{V}^{-1} Z)^{-1} Z^{\tau} \hat{V}^{-1} X$ with a consistent $\hat{V}$. Assume the conditions in Theorem 3.12 (attached). Then $$l^\tau(\hat{\beta}_w - \breve{\beta})/a_n \rightarrow_d N(0, 1)$$ where $l \in \mathcal{R}^p$, $l \neq 0$, and $$a^2_n = Var(l^\tau \breve{\beta}) = l^\tau(Z^\tau V^{-1} Z)^{-1}l.$$
Here are some definitions and assumptions involved:
The model is $X = Z\beta + \varepsilon$, where $X = (X_1, \cdots, X_n)$, $\varepsilon = (\varepsilon_1, \cdots, \varepsilon_n)$, and $Z$ be the $n \times p$ matrix of covariates. $\beta$ is unknown, $Z$ is non-random, and $\varepsilon$ is random error. $V = Var(\varepsilon)$ is unknown and $\hat{V}$ is an estimator of $V$. $\hat{V}$ is consistent for $V$ if and only if $||\hat{V}^{-1}V - I_n||_{max} \rightarrow_p 0$, where $||A||_{max} = \max_{i,j} \|a_{ij}\|$ for a matrix $A$. Let $A_n = V^{1/2}\hat{V}^{-1}V^{1/2} - I_n$ and $||A|| = \sqrt{tr(A^\tau A)}$.
Summary: The proof decomposes $l^\tau(\hat{\beta}_w - \breve{\beta})$ into two parts $\xi_n + \zeta_n$, where $\xi_n = l^\tau(Z^\tau \hat{V}^{-1} Z)^{-1} Z^{\tau} (\hat{V}^{-1} - V^{-1})\varepsilon$ and $\zeta_n = l^\tau[(Z^\tau \hat{V}^{-1} Z)^{-1} - (Z^\tau V^{-1} Z)^{-1}]Z^{\tau}V^{-1}\varepsilon$, then showed that the statement holds because both $\xi_n$ and $\zeta_n$ are $o_p(a_n)$.
In the derivation of the $\xi_n$ part (highlighted in blue), Jun Shao first showed that $$\xi_n = [l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon]^2,$$ which is fine. He then declared that this term is a covariance matrix and by the inequality $[Cov(X_i, X_j)]^2 \leq Var(X_i)Var(X_j), i\neq j$, this term is $\leq (\varepsilon V^{-1} \varepsilon^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n^2 V^{-1/2} Z (Z^\tau \hat{V}^{-1} Z)^{-1} l$.
My question: It is not very obvious what are the two variables involved in this covariance matrix. May you help me identify them?
Thank you very much in advance.
Screenshots of the involved theorems:
Theorem 3.17
Theorem 3.12

It seems a bit weird to say that $\xi_n$ is a covariance matrix when it contains $\varepsilon$ which is a random vector. I think that a more straightforward way of getting the inequality is to use the scalar product $\langle A, B \rangle = \mbox{Tr}(A^\tau B)$ and the implied Cauchy-Scwhartz inequality $$\mbox{Tr}(A^\tau B)^2 \leq \mbox{Tr}(A^\tau A) \mbox{Tr}(B^\tau B).$$ Now, the quantity $l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon$ is one dimensional hence \begin{align*} l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon &= \mbox{Tr}(l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon) \end{align*} and we can apply the above Cauchy-Schwatz inequality with $A = l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n$ and $B=V^{-1/2} \varepsilon$. Morevoer, notice that $$\mbox{Tr}(A^\tau A)=\mbox{Tr}(AA^\tau)=l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n^2 V^{-1/2} Z (Z^\tau \hat{V}^{-1} Z)^{-1}l$$ and $$\mbox{Tr}(B^\tau B)=\varepsilon^\tau V^{-1} \varepsilon$$ where in both case the trace disappear since we get one-dimensional quantities. Finally, $$(l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n V^{-1/2} \varepsilon )^2 \leq \varepsilon^\tau V^{-1} \varepsilon l^\tau (Z^\tau \hat{V}^{-1} Z)^{-1} Z^\tau V^{-1/2} A_n^2 V^{-1/2} Z (Z^\tau \hat{V}^{-1} Z)^{-1} l.$$
This is slightly different than the inequality you highlighted in blue in the book which starts with $\varepsilon V^{-1} \varepsilon^\tau$ and not $\varepsilon^\tau V^{-1} \varepsilon$. Although, the expression in the book doesn't seem to make sense to me because $\varepsilon$ is of dimension $n\times 1$ and $V=\mbox{Var}(\varepsilon)$ is of dimension $n \times n$.