Assume $X$ is a random vector in $\mathbb{R}^d$ and $g: \mathbb{R}^d \to \mathbb{R}^D$ is a function.
Then how do we find a matrix $\mathbf{W} = \text{argmin}_{\mathbf{W}} \mathbb{E} [ ||\mathbf{W}^T X - g(X)||^2]$?
Can we prove $$ \mathbf{W} = \lim_{N \to \infty}(\mathbf{X}_N^T\mathbf{X}_N)^{-1}\mathbf{X}_N^Tg(\mathbf{X_N}) $$ where $\mathbf{X}_N = [\mathbf{x}_1 \dots \mathbf{x}_N]^T$ with $\mathbf{x}_i$ sampled from the distribution of $X$ by, for example, CLT?
Or is there any other approach to find (or express) it or to approximate it?
Is there any textbook that gives rigorous derivation on the multivariate regression?
For a vector argument $x$, consider the function $$y(x)=W^Tx-g(x)$$ Draw samples $\{x_i\}$ from the distribution and evaluate the function $$\eqalign{ y_i &= W^Tx_i - g_i \cr Y &= W^TX - G \cr }$$ where $(Y,X,G)$ are matrices whose columns are the vectors $(y_i,x_i,g_i)$ respectively.
The expectation after $N$ samples is $$\eqalign{ E &= \|Y\|_F^2 = Y:Y \cr }$$ where colon denotes the trace/Frobenius product, i.e. $\,\,A:B={\rm tr}(A^TB)$
Calculate the differential and gradient of $E$ $$\eqalign{ dE &= 2Y:dY = 2Y:dW^T\,X = 2XY^T:dW \cr \frac{\partial E}{\partial W} &= 2XY^T = 2X(W^TX-G)^T = 2(XX^TW-XG^T) \cr }$$ Minimize the expectation by setting the gradient to zero and solving $$\eqalign{ XX^TW &= XG^T \cr W &= (XX^T)^{-1}XG^T \cr }$$