Asymptotic distribution of $n^{\frac{1}{2}}(\hat{\gamma},\gamma_0)$

41 Views Asked by At

I am struggling to understand a proof from Browne M. (1984) Asymptotically distribution-free methods for the analysis of covariance structures. Given $\boldsymbol{\delta_s}=n^{\frac{1}{2}}(\boldsymbol{s}-\boldsymbol{\sigma_0})$ has null mean vector and covariance matrix $\boldsymbol{Y}$, with null asymptotic mean and asymptotic covariance $\bar{\boldsymbol{Y}}$ which is assumed positive definite, where $\boldsymbol{s}=\text{vecs}\boldsymbol{S}$ is the vector formed from the non-duplicated elements of the unbiased estimator $\boldsymbol{S}$ of a covariance matrix $\boldsymbol{\Sigma_0}$ obtained from $N=n+1$ observations on a vector variate $\boldsymbol{x}$ with finite fourth order moments whereas $\boldsymbol{\sigma_0}=\text{vecs}(\boldsymbol{\Sigma_0})$. $\boldsymbol{\Delta}$ is the jacobian of $\boldsymbol{\sigma}(\boldsymbol{\gamma})$. $\boldsymbol{U}$ converges in probability to a positive definite matrix $\bar{\boldsymbol{U}}$. We assume:

  1. $\boldsymbol{F}(\boldsymbol{\Sigma_0},\boldsymbol{\Sigma}(\boldsymbol{\gamma}))$ has a unique minimum on $G$ at $\boldsymbol{\gamma}=\boldsymbol{\gamma_0}$.
  2. $\hat{\boldsymbol{\gamma}}$ is an consistent estimator of $\boldsymbol{\gamma_0}$
  3. $\boldsymbol{\gamma_0}$ is an interior point of the parameter space $G$.
  4. $\boldsymbol{\Delta_0}=\boldsymbol{\Delta}(\boldsymbol{\gamma_0})$ has full column rank
  5. $\boldsymbol{\Delta}(\boldsymbol{\gamma})$ and $\boldsymbol{\Sigma} (\boldsymbol{\gamma})$ are continuous functions of $\boldsymbol{\gamma}$.
  6. $||\boldsymbol{\Sigma_0}-\boldsymbol{\Sigma}(\boldsymbol{\gamma_0})||$ is $O(n^{-\frac{1}{2}})$
  7. The parameter set $G$ is closed and bounded.

where $\boldsymbol{\gamma_0}$ will be taken as the value of $\boldsymbol{\gamma}$ which minimises the discrepancy function $\boldsymbol{F}(\boldsymbol{\Sigma_0},\boldsymbol{\Sigma}(\boldsymbol{\gamma}))$.

Proposition 2: Given that $\hat{\boldsymbol{\gamma}}$ is a GLS estimator derived by minimizing $F(\boldsymbol{S},\boldsymbol{\Sigma}(\boldsymbol{\gamma})|\boldsymbol{U})=(\boldsymbol{s}-\boldsymbol{\sigma}(\boldsymbol{\gamma}))^T\boldsymbol{U}^{-1}(\boldsymbol{s}-\boldsymbol{\sigma}(\boldsymbol{\gamma})),$ then the asymptotic distribution of $\hat{\boldsymbol{\delta}}_{\boldsymbol{\gamma}}=n^{\frac{1}{2}}(\hat{\boldsymbol{\gamma}}-\boldsymbol{\gamma_0})$ is multivariate normal with a null mean vector and covariance matrix :

\begin{equation}\label{2.12a} L \text{cov}(\hat{\boldsymbol{\delta}}_{\boldsymbol{\gamma}}\hat{\boldsymbol{\delta}}_{\boldsymbol{\gamma}}^T)=\{\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\boldsymbol{\Delta_0}\}^{-1}\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\bar{\boldsymbol{Y}}\bar{\boldsymbol{U}}^{-1}\boldsymbol{\Delta_0}\{\boldsymbol{\Delta_0}^T\boldsymbol{\Delta_0}^{-1}\boldsymbol{\Delta_0}\}^{-1}, \end{equation}

The proof he gives is as follows:

Proof:

Let $\boldsymbol{g}(\boldsymbol{\gamma}|\boldsymbol{S},\boldsymbol{U})$ be the $q\times 1$ negative gradient of $\frac{1}{2}\boldsymbol{F}(\boldsymbol{S},\boldsymbol{\Sigma}(\boldsymbol{\gamma}|\boldsymbol{U})$: \begin{align} \begin{split} \boldsymbol{g}(\boldsymbol{\gamma}|\boldsymbol{S},\boldsymbol{U}) &=\boldsymbol{\Delta}^{T}\boldsymbol{U}^{-1}\{\boldsymbol{s}-\boldsymbol{\sigma}(\boldsymbol{\gamma})\}. \end{split} \end{align}

$\hat{\boldsymbol{\gamma}}$ minimises $\frac{1}{2}\boldsymbol{F}(\boldsymbol{S},\boldsymbol{\Sigma}(\boldsymbol{\gamma}|\boldsymbol{U})$ and thus from 3 and 2 that with probability arbitrarily close to $1$ for a sufficiently large $n$ we have that:

\begin{equation} \boldsymbol{g}(\hat{\boldsymbol{\gamma}}|\boldsymbol{S},\boldsymbol{U})=\boldsymbol{\Delta}^{T}\boldsymbol{U}^{-1}[\boldsymbol{s}-\boldsymbol{\sigma_0}+\{\boldsymbol{\sigma_0}-\hat{\boldsymbol{\sigma_0}}\}-\{\hat{\boldsymbol{\sigma}}-\hat{\boldsymbol{\sigma_0}}\}]=0, \label{2.15} \end{equation} where $\boldsymbol{\sigma_0}=\text{vecs}(\boldsymbol{\Sigma_0})$, $\hat{\boldsymbol{\sigma}}_{\boldsymbol{0}}=\text{vecs}(\boldsymbol{\Sigma}(\boldsymbol{\gamma_0}))$, $\hat{\boldsymbol{\sigma}}=\text{vecs}(\boldsymbol{\Sigma}(\hat{\boldsymbol{\gamma}}))$ and $\hat{\boldsymbol{\Delta}}=\boldsymbol{\Delta}(\hat{\boldsymbol{\gamma}})$.

By 3, 5 and the Mean-Value Theorem:

\begin{equation} \hat{\boldsymbol{\sigma}}-\hat{\boldsymbol{\sigma_0}}=\tilde{\boldsymbol{\Delta}}\{\hat{\boldsymbol{\gamma}}-\boldsymbol{\gamma_0}\}, \end{equation} where $\tilde{\boldsymbol{\Delta}}$ is a $p^*\times q$ matrix of derivatives of $\boldsymbol{\sigma}(\boldsymbol{\gamma})$ with respect to $\boldsymbol{\gamma}$ evaluated at points on the line segment joining $\hat{\boldsymbol{\gamma}}$ and $\boldsymbol{\gamma_0}$. Substituting and using 2, 5, 4, 6, we get that

\begin{equation} n^{\frac{1}{2}}\{\hat{\boldsymbol{\gamma}}-\boldsymbol{\gamma_0}\} \stackrel{a}{=}n^{\frac{1}{2}}\{\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\boldsymbol{\Delta_0}\}^{-1}\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\boldsymbol{\delta_s}+n^{\frac{1}{2}}\boldsymbol{g}(\boldsymbol{\gamma_0}|\boldsymbol{\Sigma_0},\bar{\boldsymbol{U}}), \end{equation}

where $\stackrel{a}{=}$ refers to asymptotic equivalence implying that the difference from the left-hand side and right hand side converges in probability to zero as $n\rightarrow \infty$.

I found this last step is a bit confusing. Shouldn't this be:

\begin{equation} n^{\frac{1}{2}}\{\hat{\boldsymbol{\gamma}}-\boldsymbol{\gamma_0}\} \stackrel{a}{=}\{\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\boldsymbol{\Delta_0}\}^{-1}\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\boldsymbol{\delta_s}+n^{\frac{1}{2}}\{\boldsymbol{\Delta_0}^T\bar{\boldsymbol{U}}^{-1}\boldsymbol{\Delta_0}\}^{-1}\boldsymbol{g}(\boldsymbol{\gamma_0}|\boldsymbol{\Sigma_0},\bar{\boldsymbol{U}}), \end{equation}

I would appreciate any hints.