We have the following system
$$Y = A X$$
Using least-squares, an estimate of $X$ is
$$\widehat{X} = (A^H A)^{-1} A^H Y$$
And thus the mean square error for me is:
$$\mbox{MSE} = \mathbb E[ \|(Y - A\cdot{}\widehat{X})\|^2 ]$$
But looking at lscov routine of matlab, they seems to define returned mse parameter as:
$$MSE_{lscov} = \frac{Y^H \cdot{}(Y - A\cdot{}\widehat{X})}{m-n}$$
Where $m$ and $n$ is the size of matrix $A$.
I clearly don't understand what is the meaning of $MSE_{lscov}$ and how it relates to what I call the mean square error ?
For instance with
$A = \begin{bmatrix} 1.0000 & 0.2000 & 0.1000\\ 1.0000 & 0.5000 & 0.3000\\ 1.0000 & 0.6000 & 0.4000\\ 1.0000 & 0.8000 & 0.9000\\ 1.0000 & 1.0000 & 1.1000\\ 1.0000 & 1.1000 & 1.4000\\ \end{bmatrix}$
and
$Y = \begin{bmatrix} 0.1700\\ 0.2600\\ 0.2800\\ 0.2300\\ 0.2700\\ 0.3400 \end{bmatrix}$
I get:
$MSE = 0.00077279766...$
$MSE_{lscov} = 0.00154559...$
$\frac{Y^H \cdot{}(Y - A\cdot{}\widehat{X})}{m-n} = MSE_{lscov} = 0.00154559...$
Let $X$ be the original data matrix, $Y$ is the dependent variable and $\beta$ is the unknown coefficients. Namely, your model is $Y=X\beta+\epsilon$ where $\epsilon \sim \mathcal{N}(0,\sigma^2)$. As such, $\epsilon^2 =(Y-X\beta)^2$. So, $\sigma^2 = var(\epsilon) = \mathbb{E}\epsilon^2 = \mathbb{E}(Y-X\beta)^2$. Thus when you estimate $\beta$ by specifying some model, you can compute its MSE, that is $$ MSE = \mathbb{E}||Y-X\hat{\beta}||^2. $$ Hence you have to estimate it with an appropriate statistic, which intuitively would be $$ \frac{1}{n}\sum_{i=1}^n(y_i - \hat{y})^2, $$
however, although it is asymptotically consistent estimator, it is biased for every finite $n$. So, you can view $$ \hat{\sigma}^2 = \frac{1}{n-m}\sum_{i=1}^n(y_i - \hat{y})^2, $$ where $m$ is the number of coefficients in the model, or similarly the column dimension of the data matrix $X'X$ (or $A'A$ in your notations), as the unbiased modified estimator of the real MSE ($\sigma^2$). Intuitively, you loose one degree of freedom for every estimated $\beta$. By doing some simple algebra you can get $$ \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \hat{y})(y_i - \hat{y}) = \sum_{i=1}^n(y_i - \hat{y})y_i - \sum_{i=1}^n(y_i - \hat{y})\hat{y}, $$ where the second summand is $0$ because $\sum_{i=1}^n(y_i - \hat{y})\hat{y}=\sum_{i=1}^ne_i\hat{y}=0$. The final equality, i.e., the orthogonality of the $e$ and $\hat{y}$ stems from the algorithm that derives the OLS estimators. So, back to matrix notation you get $$ \hat{MSE}=\frac{1}{n-m}Y'(Y-X\hat{\beta}). $$