find a linear function $f(x,y) = ax + by + c$ which minimizes the total square error
$E(a,b,c) = \sum_{i=1}^{n} (f(x_{i},y_{i})-z_{i})^{2} $
where $(x_{i},y_{i})$ are $n$ given distinct points and $z_{1}......z_{n}$ are $n$ given real numbers
find a linear function $f(x,y) = ax + by + c$ which minimizes the total square error
$E(a,b,c) = \sum_{i=1}^{n} (f(x_{i},y_{i})-z_{i})^{2} $
where $(x_{i},y_{i})$ are $n$ given distinct points and $z_{1}......z_{n}$ are $n$ given real numbers
On
Using BranchedOut's suggestion and writing it out explicitly.
$\begin{array}\\ E(a,b,c) &= \sum_{i=1}^{n} (f(x_{i},y_{i})-z_{i})^{2}\\ &= \sum_{i=1}^{n} (ax_{i}+by_{i}+c-z_{i})^{2}\\ \text{so}\\ \dfrac{\partial E(a,b,c)}{\partial a} &= \sum_{i=1}^{n} 2x_i(ax_{i}+by_{i}+c-z_{i})\\ &= \sum_{i=1}^{n} (2ax_i^2+2bx_iy_{i}+2cx_i-2x_iz_{i})\\ &= 2a\sum_{i=1}^{n} x_i^2+2b\sum_{i=1}^{n}x_iy_{i}+2c\sum_{i=1}^{n}x_i-2\sum_{i=1}^{n}x_iz_{i}\\ \end{array} $
Defining $r(u) =\sum_{i=1}^{n}u_i $, $s(u) =\sum_{i=1}^{n}u_i^2 $, and $t(u, v) =\sum_{i=1}^{n}u_i v_i $ where $u$ and $v$ are any of $x, y, z$, this becomes $\dfrac{\partial E(a,b,c)}{\partial a} = 2as(x)+2bt(x, y)+2cr(x)-2t(x, z) $. For extreme values of $E(a, b, c)$, this requires. $as(x)+bt(x, y)+cr(x)-t(x, z) =0 $.
This is the first of the normal equations for $a, b$, and $c$.
Doing the same thing for $\dfrac{\partial E(a,b,c)}{\partial b} $ and $\dfrac{\partial E(a,b,c)}{\partial c} $ will get the other two equations.
Solving them will get the values of $a, b,$ and $c$.
Let $\textbf{x} = (x,y,1)^T$ and let $X = (\textbf{x}_1, \cdots, \textbf{x}_n)$ be the data matrix (each column corresponds to one sampling point). Let also $\textbf{z} = (z_1, \cdots, z_n)^T$ be the measurements. You're trying to find a vector $\textbf{v} = (a,b,c)^T$ which minimizes $g(\textbf{v}) = \frac{1}{2}\|X^T\textbf{v} - \textbf{z}\|_2^2$. It follows: \begin{align} g(\textbf{v}) = \frac{1}{2}\langle X^T\textbf{v}, X^T\textbf{v}\rangle - \langle X^T\textbf{v},\textbf{z}\rangle + \frac{1}{2} \|\textbf{z}\|_2^2. \end{align} Setting the gradient equal to 0 yields $\nabla g(\textbf{v}) = XX^T\textbf{v}-X\textbf{z} = 0 \Leftrightarrow \textbf{v} = (XX^T)^{-1}X\textbf{z}$ (these are called the normal equations). Note that the inverse is well defined, since you assume that the points are distinct.
You might want to read a bit more about linear regression and/or normal equations.