Firstly, I am not a student of mathematics so although this question might be obvious to a lot of you, Its not obvious to me.
The idea is, given a function $f:\mathbb{R^1}\to \mathbb{R^1}$, were I to fit a linear equation $y=mx+c=f(x)$, to the data set $(x_1, y_1), (x_2,y_2), (x_3,y_3)....(x_n,y_n)$, I would want to minimize the error $e$ by tuning $m$ and $c$, where
$$e(m,c)=\sum_{i=1}^n\, (y_i - m x_i -c)^2$$
Now, if I had a function $f:\mathbb{R^2}\to \mathbb{R^2}$ such that I were to fit $\vec{Y}=[M]\vec{X}+\vec{C}$ to the data set, $(\vec{X_1},\vec{Y_1}),(\vec{X_2}, \vec{Y_2}), (\vec{X_3},\vec{Y_3}).......(\vec{X_n},\vec{Y_n})$, then does it boil down to the problem of minimizing $e([M],\vec{C})$ where
$$e([M],\vec{C})=\sum_{i=1}^{n}\,(\vec{Y_i}-[M]\vec{X_i}-\vec{C_i})\cdot(\vec{Y_i}-[M]\vec{X_i}-\vec{C_i})$$
Am I going in the right direction? Could you point me to some aleady available liteature for the same? Note: $\vec{X}, \vec{Y}$ and $\vec{C}$ are column vectors of dimension $(2 ,1)$ while $[M]$ is a matrix of dimension $(2,2)$.