I might be completely at a loss here. My issue is with the best linear predictor (or in a sample version the OLS estimator) for simultaneous equation models.
With a single equation the best linear predictor of a scalar dependent variable, $y$, based on a $k\times 1$ vector of explanatory variables, $x$, is commonly given by way of minimizing the mean-square-error loss, i.e. $\text{E}[(y-x^\prime\beta)^2]$, such that $$\beta=\text{E}[xx^\prime]^{-1}\text{E}[xy]$$ while the OLS estimator is given as $$\hat{\beta}=\text{argmin }(y-X\beta)^\prime(y-X\beta)=(X^\prime X)^{-1}X^\prime y$$ where $y$ is $n\times 1$ and $X$ is $n\times k.$
So far I have not been able to find the analogous objective functions for the following problems:
Let $y$ be a $k\times 1$ vector-valued dependent variable and with $x$ being an $l\times 1$ vector of covariates, the best linear predictor of $y$ given $x$ ( i.e. $\Gamma^\prime x$ ) is typically displayed as $$\Gamma=\text{E}[xx^\prime]^{-1}\text{E}[xy^\prime].$$ I'd very much appreciate if anyone could give me a pointer with respect to the objective function on which this result is based. Provided the following is the objective function, I have to admit that I fail to see how $\Gamma=\text{argmin } E[(y-\Gamma^\prime x)^\prime(y-\Gamma^\prime x)].$
Moreover, I have the same issue with the sample version of this problem. That is to say, let $Y$ be a $n\times k$ matrix of dependent variables and let $X$ be a $n\times l$ matrix of covariates. Given a linear conditional expectation function model $$Y=X\Gamma+U\quad\text{where}\quad \text{E}[U|X]=0$$ the OLS estimator of $\Gamma$ is commonly given as $$\hat{\Gamma}=(X^\prime X)^{-1}X^\prime Y$$ Even though I can see a sort of shortcut derivation, exploiting the fact that with OLS it should hold that $X^\prime\hat{U}=0$, I'd very much be interested in the proper formulation of an objective function ( in the sense of $f:\mathbb{R}^{n\times k}\to\mathbb{R}$ such that $\hat{\Gamma}=\text{argmin }f(Y-X\Gamma)$ ) as well as the derivation of the estimator.
Many thanks and best wishes,
Jon
Let $$f(\Gamma) = E [ (y-\Gamma^T x)^T(y-\Gamma^T x) ]= E [ y^T y ]- E[ y^T \Gamma^Tx ] - E[ x^T \Gamma y ] + E [ x^T \Gamma \Gamma^T x ]$$ If we denote $R_{xx} = E[x x^T],R_{xy} = E[x y^T],R_{xx} = E[y y^T]$, we could re-write the above as $$f(\Gamma) = \operatorname{tr}(R_{yy}) - \operatorname{tr}(\Gamma ^TR_{xy})-\operatorname{tr}(\Gamma R_{yx})+\operatorname{tr}(\Gamma\Gamma^TR_{xx})$$ Using some matrix differentials like
\begin{align} \partial \operatorname{tr}(X) &= \operatorname{tr}(\partial X)\\ \frac{\partial}{\partial X}\operatorname{tr}(X^TA)&= A^T\\ \frac{\partial}{\partial X}\operatorname{tr}(XX^TA)&= A^TX + AX \end{align}
we now have $$\frac{\partial f(\Gamma)}{\partial \Gamma} = - R_{xy}-R_{yx}^T+R_{xx}\Gamma+ R_{xx}\Gamma^T$$ Noticing that $R_{yx}^T=R_{xy}$ and setting the derivative to zero, we have now $$\frac{\partial f(\Gamma)}{\partial \Gamma} = - 2R_{xy}+R_{xx}(\Gamma+\Gamma^T) = \pmb{0}$$ or $$\Gamma_*+\Gamma_*^T = 2R_{xx}^{-1}R_{xy}$$ In the case where $\Gamma$ is square and symmetric, we have that $\Gamma = \Gamma^T$ and hence $$\Gamma_* = R_{xx}^{-1}R_{xy} $$