I wish to find the matrix that minimises the following Loss function
$$\mathbf{A}^* = \text{argmin}_\mathbf{A} \int_{\mathbf{x}\in\mathbb{X}} \|\mathbf{A}\mathbf{x}-\mathbf{y}(\mathbf{x})\|^2d\mathbf{x}$$
Is there a closed form solution or expression?
I tried to find a solution based on the somewhat similar and simpler Linear Least Squares problem
$$\mathbf{A}^* = \text{argmin}_\mathbf{A} \|\mathbf{A}\mathbf{X}-\mathbf{Y}\|^2$$
where $\mathbf{X}$ is a matrix where each column is sampled from the $\mathbb{X}$ space, and $\mathbf{Y}$ is the matrix with the corresponding $\mathbf{y}(\mathbf{x})$ o its columns. Additionaly, if we consider $\mathbf{X}$ with linearly independent rows (easy to accomplish with sufficiently random $\mathbf{x}$) we have the closed form solution for $\mathbf{A}^* = \mathbf{Y}\mathbf{X}^T\left(\mathbf{XX}^T\right)^{-1}$
Inspired by this closed form expression I tried to compute analytically the optimal $\mathbf{A}$ for the original integral problem by
$$\mathbf{A}^*=\left(\int_{\mathbf{x}\in\mathbb{X}}\mathbf{yx}^Td\mathbf{x}\right)\left(\int_{\mathbf{x}\in\mathbb{X}}\mathbf{xx}^Td\mathbf{x}\right)^{-1}$$
But this expression did not evaluate to its numerical counterpart ($\mathbf{A}^* = \mathbf{Y}\mathbf{X}^T\left(\mathbf{XX}^T\right)^{-1}$ for $\mathbf{X}$ and $\mathbf{Y}$ sufficiently sampled).
I wonder if the two solutions did not coincide because of numerical errors or because my argument is fundamentally flawed.
Thanks for the help
For a concrete example where both analytically and numerically they are the same:
$\mathbb{X}=\{x\in\mathbb{R}^2:\|x\|_2=1\}$ and $\mathbf{y}=[\mathbf{x}]_+$
Here we have
\begin{align} \left(\int_{\mathbb{X}}\mathbf{xx}^Td\mathbf{x}\right)^{-1}&=\left(\int_0^{2\pi}\begin{bmatrix}\cos(\theta)\\\sin(\theta)\end{bmatrix}\begin{bmatrix}\cos(\theta)&\sin(\theta)\end{bmatrix}d\theta\right)^{-1}\\ &=\left(\int_0^{2\pi}\begin{bmatrix}\cos^2(\theta)&\sin(\theta)\cos(\theta)\\\sin(\theta)\cos(\theta)&\sin^2(\theta)\end{bmatrix}d\theta\right)^{-1}\\ &=\left(\begin{bmatrix}\pi&0\\0&\pi\end{bmatrix}\right)^{-1}=\frac{1}{\pi}\begin{bmatrix}1&0\\0&1\end{bmatrix} \end{align}
And
\begin{align} \int_{\mathbb{X}}\mathbf{y}(\mathbf{x})\mathbf{x}^Td\mathbf{x}&=\int_0^{2\pi}\begin{bmatrix}[\cos(\theta)]_+\\ [\sin(\theta)]_+\end{bmatrix}\begin{bmatrix}\cos(\theta)&\sin(\theta)\end{bmatrix}d\theta\\ &=\frac{\pi}{2}\begin{bmatrix}1&0\\0&1\end{bmatrix} \end{align}
Thus
$$\mathbf{A}^*=\frac{1}{2}\begin{bmatrix}1&0\\0&1\end{bmatrix}$$
Which is what one gets through numerics
Your expression of the optimal matrix seems correct to me. To prove it, your integral equation is equivalent to project ${\bf y}$ into the set of linear functions in $L^2(\mathbb X, \mathbb R^n)$.
For $i,j \in \mathbb R^n$, let ${\bf E}_{ij}$ the matrix with one in the $(i,j)-$entry and with $0$ elsewhere. The family $({\bf x} \mapsto {\bf E}_{ij} {\bf x})_{1\le i,j\le n}$ is a basis of the set of linear functions. Since ${\bf x } \mapsto {\bf A}^*{\bf x}$ is the projection of ${\bf y}$, then : $$\left\langle {\bf y} - {\bf A^* x}, {\bf E}_{ij}{\bf x} \right\rangle = 0\;\; \forall i,j$$
so
$$\int_{\mathbb X} \left({\bf E}_{ij} {\bf x}\right)^T \cdot ({\bf y}({\bf x}) - {\bf A^* x}) \mathrm d {\bf x}$$
Or $$\left({\bf E}_{ij}{\bf x}\right)_k = {\bf x}_i\;\; \text{if $k=j$ and } \left({\bf E}_{ij}{\bf x}\right)_k = 0 \;\; \text{otherwise}$$
$$\int_{\mathbb X} {\bf y}({\bf x})_{j}{\bf x}_i \mathrm d {\bf x} = \sum_{k=1}^n {\bf A}^*_{jk}\int_{\mathbb X} {\bf x}_k{\bf x}_i \mathrm d {\bf x}\;\; \forall i,j$$
$$\left(\int_{\mathbb X} {\bf y}({\bf x}) {\bf x}^T \mathrm d {\bf x}\right)_{ji} = \sum_{k=1}^n{\bf A}^*_{jk}\left(\int_{\mathbb X} {\bf x} {\bf x}^T\mathrm d {\bf x}\right)_{ki} =\left({\bf A}^*\int_{\mathbb X} {\bf x} {\bf x}^T\mathrm d {\bf x}\right)_{ji}\;\; \forall i,j$$
this proves that : $${\bf A}^*\int_{\mathbb X} {\bf x} {\bf x}^T\mathrm d {\bf x} = \int_{\mathbb X} {\bf y}({\bf x}) {\bf x}^T \mathrm d {\bf x}$$