If you have a linear function $y=Ax+B$, you can uniquely identify it by two points on that line.
If you have a quadratic function $y=Ax^2+Bx+C$ you can uniquely identify it with three points.
The pattern continues for higher functions as well as lower (0th degree constant functions can be uniquely identified by a single point).
This extends to higher dimensions as well.
If you have a function $z=Axy+Bx+Cy+D$ aka a bilinear surface, you can uniquely identify it by 4 points.
Taking surfaces, volumes, or higher and extending them to higher degrees, the pattern continues as far as I know and can tell.
Why is this? While I see the pattern, I can't understand why it's true.
There is considerable nuance in this equation, because there is no mathematical requirement to match the number of data points to the number of fit parameters.
Start with the polynomial approximation through order $d$ of $f(x,y)$: $$ \begin{align} f(x,y) &= a_{0,0} + a_{1,0}x + a_{0,1}x + a_{2,0}x^{2} + a_{1,1}xy + a_{0,2}y^{2} + \dots + a_{0,d}y^{d} \\ &= a_{0,0} + \sum_{k=1}^{d} \sum_{j=0}^{k} a_{}x^{k-j} y^{j} \end{align} $$ The number of terms $n = \frac{1} {2}(d+1)(d+2)$.
Now assume a series of $m$ independent measurements $\left\{ x_{i}, y_{i}, f(x_{i}, y_{i}) \right\}_{i=1}^{m}$. Use the method of least squares to find the amplitudes $a$.
The linear system is $$ \begin{align} \mathbf{A} a &= f \\ \left[ \begin{array}{ccccc} 1 & x_{1} & y_{1} & x_{1}^{2} & x_{1}y_{1} & y_{1}^{2} & \dots & y_{1}^{d} \\ 1 & x_{2} & y_{2} & x_{2}^{2} & x_{2}y_{2} & y_{2}^{2} & \dots & y_{2}^{d} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \dots & \vdots\\ 1 & x_{m} & y_{m} & x_{m}^{2} & x_{m}y_{m} & y_{m}^{2} & \dots & y_{m}^{d} \end{array} \right] % \left[ \begin{array}{c} a_{0,0} \\ a_{1,0} \\ a_{0,1} \\ a_{2,0} \\ a_{1,1} \\ a_{0,2} \\ \vdots \\ a_{0,d} \end{array} \right] &= \left[ \begin{array}{c} f(x_{1},y_{1}) \\ f(x_{2},y_{2}) \\ \vdots \\ f(x_{m},y_{m}) \end{array} \right] \end{align} $$
Mathematically the linear system has three classifications. Overdetermined: $m>n$, A is tall, more rows than columns. Underdetermined: $m<n$, A is wide, more columns than rows. Full rank: $m=n$, A is square. Provided that the data vector is not in the null space $\mathcal{N}(\mathbf{A}^{*})$, there is a solution in each case: $$ a = \mathbf{A}^{\dagger} f + \left(\mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A} \right)z, \qquad z \in \mathbb{C}^{n} $$ The matrix operator $\left(\mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A} \right)$ is $\mathbf{P}_{\mathcal{N}(\mathbf{A})}$, a projector onto $\mathcal{N}(\mathbf{A})$, the null space of $\mathbf{A}$.
The singular value decomposition is $$ \mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^{*}, $$ where the domain matrices $U\in\mathbb{C}^{m\times m}$, $V\in\mathbb{C}^{n\times n}$ are unitary. That is, $$ \mathbf{U} \mathbf{U}^{*} = \mathbf{U}^{*}\mathbf{U} = \mathbf{I}_{m}, \qquad \mathbf{V} \mathbf{V}^{*} = \mathbf{V}^{*} \mathbf{V} = \mathbf{I}_{n}. $$ The matrix $\Sigma$ contains the ordered singular values $\sigma_{k}\in\mathbb{R}$, where $k$ runs from 1 to the matrix rank $\rho$: $$ \sigma_{1} \ge \sigma_{2} \ge \dots \ge \sigma_{\rho}. $$ The diagonal matrix $$ \mathbf{S} = \left[ \begin{array}{cccc} \sigma_{1} & 0 & \dots & 0 \\ 0 & \sigma_{2} & \dots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & 0 & \sigma{\rho} \end{array} \right] $$ is the primary block in the matrix $$ \Sigma = \left[ \begin{array}{cc} \mathbf{S} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] . $$ The pseudo inverse matrix is constructed via $$ \mathbf{A}^{\dagger} = \mathbf{V} \mathbf{\Sigma}^{\dagger} \mathbf{U}^{*} $$ where $$ \mathbf{\Sigma}^{\dagger} = \left[ \begin{array}{lc} \mathbf{S}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right]. $$
Example Pencil and paper exercise
Part I: Full rank The linear system has $m=n=\rho=2$ and is $$ \begin{align} \mathbf{A} x &= b\\ \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] &= \left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] \end{align}. $$ The matrix inverse exists and the solution is $$ \begin{align} x &= \mathbf{A}^{-1} b, \\ \left[ \begin{array}{c} x \\ y \end{array} \right] &= \left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]. \end{align} $$
Part II: Underdetermined The linear system has $m=n=2$ with $\rho=1$ and is $$ \begin{align} \mathbf{A} x &= b\\ \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] &= \left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] \end{align}. $$ The singular value decomposition of $\mathbf{A}$ is painless to compute: $$ \mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^{*} = \left[ \begin{array}{cc} \color{blue}1 & \color{red}0 \\ \color{blue}0 & \color{red}1 \end{array} \right] \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{cc} \color{blue}1 & \color{blue}0 \\ \color{red}0 & \color{red}1 \end{array} \right] $$ Blue vectors are in a range space, red vectors a null space.
The pseudoinverse follows immediately: $$ \mathbf{A}^{\dagger} = \mathbf{V} \Sigma^{\dagger} \mathbf{U}^{*} = \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right] $$ The projector matrix is $$ \mathbf{P}_{\mathcal{N}(\mathcal{A})} = \mathbf{I}_{2} - \mathbf{A}^{\dagger}\mathbf{A} = \left[ \begin{array}{cc} 0 & 0 \\ 0 & 1 \end{array} \right]. $$ The least squares minimizers are $$ \begin{align} x_{LS} & = \mathbf{A}^{\dagger} b + \left( \mathbf{I}_{2} - \mathbf{A}^{\dagger}\mathbf{A} \right) \\ % \left[ \begin{array}{c} x \\ y \end{array} \right] & = \left[ \begin{array}{c} b_{1} \\ 0 \end{array} \right] + \alpha \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \end{align} $$ where the arbitrary parameter $\alpha \in \mathbb{R}$.
The range space of $\mathbf{A}$ is the $x-$axis; the null space is the $y-$axis. The data is in the plane, be we can only see along the $x-$axis. The affine space of the underdetermined solution contains the full rank solution with the proper selection of the parameter $\alpha$, that is, $\alpha = b_{1}$.