Suppose $X_i \stackrel{iid}{\sim}F_\theta$, where $F_\theta$ is a probability distribution parameterized by a finite-dimensional vector $\theta$. Let $\hat{\theta}_n$ denote the maximum likelihood estimator of $\theta$, i.e. $$ \hat{\theta}_n = \arg \max_\theta \frac{1}{n}\sum_{i=1}^n f(X_i;\theta) $$ where $f(x;\theta)$ is the density of $X$. Suppose we are interested in determining the asymptotic properties of $Z_{i,n} = g(X_i,\hat{\theta}_n)$, where $g$ is a continuously differentiable function. What is the asymptotic distribution of $\bar Z_n := \frac{1}{n}\sum_{i=1}^n Z_{i,n}$?
My attempt:
Taking a Taylor expansion, we obtain
$$ Z_{i,n} = g(X_i,\theta) + \frac{\partial g}{\partial y}(X_i,\theta) (\hat{\theta}_n - \theta) + o_P(||\hat{\theta}_n - \theta||) $$
from which it follows that
$$ \sqrt{n}(\bar Z_n - EZ) = \sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n g(X_i,\theta) - EZ\right) + \left(\frac{1}{n}\sum_{i=1}^n\frac{\partial g}{\partial y}(X_i,\theta)\right) \sqrt{n} (\hat{\theta}_n - \theta) + o_P(||Y_n - \theta||) $$ where $EZ := Eg(X_i,\theta)$. I can apply the CLT to get the asymptotic distribution of $\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n g(X_i,\theta) - EZ\right)$ and I can use the Delta Method to obtain the asymptotic distribution of $\left(\frac{1}{n}\sum_{i=1}^n\frac{\partial g}{\partial y}(X_i,\theta)\right) \sqrt{n} (\hat{\theta}_n - \theta)$. However, I know marginal convergence in distribution does not imply joint convergence in distribution, so I can't immediately use the continuous mapping theorem to determine the distribution of the sum of the two terms. How should I proceed?
Under some hypotheses which are assumed for asymptotic normality of MLE, it is possible.
Sketch: Denote $u(x,\theta) = D_\theta \log f(x,\theta)$, $U(X,\theta) = \sum_{k=1}^n u(X_k,\theta)$. Then we know from the proof of asymptotic normality of MLE that $$ \theta_n -\theta \sim n^{-1}I(\theta)^{-1}U(X,\theta), $$ where $I(\theta) = - \mathsf E_\theta [D_\theta u(X,\theta)]$ is the Fisher matrix. Indeed, $$-U(X,\theta) = U(X,\hat\theta_n) - u(X,\theta) \sim D_\theta U(X,\theta)(\hat \theta_n-\theta), n\to\infty,$$ $\mathsf P_\theta$-a.s. in view of consistency of $\hat \theta_n$; on the other hand, $$ D_\theta U(X,\theta) = \sum_{k=1}^n D_\theta u(X_k,\theta)\sim - n I(\theta), n\to\infty, $$ $\mathsf P_\theta$-a.s. by LLN.
Hence it is easy to identify the required joint asymptotic behavior. Namely, $$ \left(\sqrt{n}\Big(\frac1n \sum_{k=1}^n g(X_k,\theta)- \mathsf E_\theta g(X_1,\theta)\Big), \sqrt{n} (\theta_n-\theta)\right) $$ converges in distribution to a centered normal vector with the covariance matrix $$c_{i,j}(\theta) = \begin{cases} a_{i-1,j-1}(\theta), & i,j\ge 2\\ \mathsf E_\theta [g(X_1,\theta)v_j(X_1,\theta)], & i=1,j\ge 2\\ \mathsf V_\theta\big(g(X_1,\theta)\big), & i=j=1, \end{cases}$$ where $a_{i,j}(\theta)$ are elements of $I(\theta)^{-1}$, $v_j$ is the $j$th coordinate of $v(x,\theta) = I(\theta)^{-1}u(x,\theta)$. (Note that $\mathsf E_\theta v(X_1,\theta) = 0$, so in fact $\mathsf E_\theta = \operatorname{cov}_\theta$ in the second line.)
Note that instead of $\mathsf E_\theta[Z]$ I've used $\mathsf E_\theta[g(X_1,\theta)]$, for it is infinitely easier to compute. (Just like one uses $\theta$ instead of $\mathsf E_\theta[\theta_n]$ when writing the asymptotic normality of $\theta_n$.)