Suppose that $X$ is $n\times 1$ and $a$ is scalar. They are random quantities such that $$ \operatorname{E}\left[\begin{pmatrix}a\\X\end{pmatrix}\right]=\begin{pmatrix}0\\0\end{pmatrix},\quad\operatorname{Var}\left[\begin{pmatrix}a\\X\end{pmatrix}\right]=\begin{pmatrix}\sigma_a^2 & \sigma_{aX} \\ \sigma_{Xa} & \Sigma_X\end{pmatrix}\tag{$*$} $$ where $\sigma_a>0$, $\Sigma_X$ is positive definite, and $\sigma_{Xa}$ ($n\times 1$) is the transposition of $\sigma_{aX}$.
Suppose that $(a_t,X_t)$ are i.i.d. $\sim (a,X)$ for $t=1,\ldots,T$. Then, as $T\to\infty$, $$ \sqrt{T}\left(\frac{1}{T}\sum_{t=1}^TX_ta_t-\sigma_{Xa}\right)\overset{L}{\to}N(0,V). $$ Here, $\overset{L}{\to}$ denotes convergence in distribution. It seems $V$ should be $$ \operatorname{Var}(Xa)=E(a^2XX')-\sigma_{Xa}\sigma_{aX} $$ and it doesn't seem that ($*$) has enough info to evaluate $E(a^2XX')$. But I'm not confident in my understanding. So my question is:
What should $V$ be? What if we add the fact that $(a,X')'$ is jointly normal.
Yes, $V$ is the variance of $\sqrt{T} \left( \frac{1}{T} \sum_{t=1}^T X_t a_t - \sigma_{Xa}\right)$, which is the variance of $Xa$ by the scaling and additive properties of variance, and you do not have the information to evaluate it.
In the normal case, you can use Isserlis's theorem. $$E(a^2 X X') = E(a^2) E(X X') + 2 E(a X) E(a X') = \sigma_a^2 \Sigma_X + 2 \sigma_{Xa} \sigma_{aX}$$