Given a linear map $f: \mathbf{R}^n \to \mathbf{R}^m$, and a set $S$ of tuples $(a,b) | a\in \mathbf{R}^n, b\in \mathbf{R}^m$. We want to find $argmin_M \sum_{(a,b) \in S} \lVert Ma-b \rVert_2^2$. How do we do this?
Ok Let me try:
Let $f(M)=\sum_{(a,b) \in S} \lVert Ma-b \rVert_2^2=\sum_{(a,b) \in S}(Ma-b)^T(Ma-b)=\sum_{(a,b) \in S} (a^TM^T-b^T)(Ma-b)=\sum_{(a,b)\in S}a^TM^TMa-a^TM^Tb-b^TMa+b^Tb$.
Let's denote column vector $i$ of $M$ to be $M_{,i}$ and row j of $M$ to be $M_{j,}$. For a fixed $(a,b)$, we have $f_{a,b}:=a^TM^TMa-a^TM^Tb-b^TMa+b^Tb=\sum_{i,j}a_ia_j\langle M_i,M_j\rangle-2\sum_{i,j}a_ib_jM_{j,i}+\sum_ib_i^2$.
Now, let's compute the gradient of this map, $\frac{\partial f_{a,b}}{\partial M_{i.j}}=2a_ia^TM_{,j}-2a_ib_jM_{j,i}$. So the partial derivative when we take into account all data points is $\sum_{(a,b)\in S}2a_ia^TM_{,j}-2a_ib_jM_{j,i}$. Solving for gradient equals to zero gives some value. But then How do we know if it is max mim or saddle?
Then what do i do?