Given the following likelihood:
$$\prod_{i=1}^{N} \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left\{-\frac{(y_i - x_i' \beta)^2}{2\sigma^2}\right\}$$
Thanks to the information matrix equality we have two choices to derive the theoretical asymptotic variance - either with the Hessian matrix or with the score vectors. Both should and do yield in this case the same outcome:
$$\begin{bmatrix} \sigma^2 \left(\mathsf E[x_i x_i']\right)^{-1} & 0 \\ 0 & \frac{2}{N} \sigma^4 \end{bmatrix}$$
Notice that we replace the off-diagonal elements with 0, because it is equal to the third moment of the normal distribution.
Now for the estimated asymptotic variance there are different methods, e.g. outer products of gradients (BHHH).
$$ \begin{bmatrix} \frac{1}{\left(\hat{\sigma}^2\right)^2} \sum_{i} \hat{u}_i^2 x_i x_i' & \frac{1}{2\left(\hat{\sigma}^2\right)^3} \sum_{i} \hat{u}_i^3 x_i \\ \frac{1}{2\left(\hat{\sigma}^2\right)^3} \sum_{i} \hat{u}_i^3 x_i' & -\frac{N}{4\left(\hat{\sigma}^2\right)^2} + \frac{1}{4\left(\hat{\sigma}^2\right)^4} \sum_{i} \hat{u}_i^4 \end{bmatrix}^{-1} $$
The theoretical method using the score vectors is basically the same as the BHHH approach for the estimated variance. Both work with the outer products of gradients (The first approach is using the theoretical estimators, wherease BHHH replacing them with the ML estimators). However the outcome is as you can see very different. Why would we not want to replace the off-diagonal element with 0 in the estimated version? Is that not equal to the estimated third moment of the normal distribution and therefore 0 too?