Just elaborating slightly on the case where there are two independent variables. Suppose $f:\Bbb{R}^2\to\Bbb{R}$ is a given function. Then the idea is that if you have a rectangle $A=[x_0,x_0+\Delta x]\times [y_0,y_0+\Delta y]$ in the domain, then the image $f(A)$ will be a deformed/curved rectangle which we imagine as a curved rectangular surface lying over our original rectangle $A$.
Now the question is how can we approximate the area of $f(A)$. Well we approximate our surface $f(A)$ via its tangent plane at the point $(x_0,y_0)$. Then, we have the corresponding vectors
\begin{align}
\xi_1=\Delta x\cdot\left(1,0,\frac{\partial f}{\partial x}(x_0,y_0)\right) \quad \text{and}\quad
\xi_2=\Delta y\cdot\left(0,1,\frac{\partial f}{\partial y}(x_0,y_0)\right)
\end{align}
which are tangent to the surface $f(A)$ (in other words, we are taking the tangent vectors $(\Delta x,0)$ and $(0,\Delta y)$ to the rectangle $A$ and looking at the corresponding tangent vectors $\xi_1,\xi_2$ on the image $f(A)$). Now, if the rectangle $A$ is small enough, then the plane spanned by these two vectors ought to approximate $f(A)$ well. So, we can approximate the area of $f(A)$ by the area of the parallelogram spanned by the two vectors $\xi_1,\xi_2$. But if you recall, the area of a parallelogram is the absolute value of the cross product:
\begin{align}
\text{area } f(A) &\approx \text{area spanned by $\xi_1,\xi_2$}\\
&=|\xi_1\times \xi_2|\\
&=\sqrt{1+\left(\frac{\partial f}{\partial x}(x_0,y_0)\right)^2+
\left(\frac{\partial f}{\partial y}(x_0,y_0)\right)^2}\cdot |\Delta x\Delta y|
\end{align}
In higher dimensions, one would of course need an analogous formula for the $k$-dimensional volume of the parallelepiped spanned by $k$ vectors (this is ultimately related to the gram matrix I mentioned in the comments). Anyway, hopefully this gives you a better intuition for why the surface area of a graph is as such.
Just elaborating slightly on the case where there are two independent variables. Suppose $f:\Bbb{R}^2\to\Bbb{R}$ is a given function. Then the idea is that if you have a rectangle $A=[x_0,x_0+\Delta x]\times [y_0,y_0+\Delta y]$ in the domain, then the image $f(A)$ will be a deformed/curved rectangle which we imagine as a curved rectangular surface lying over our original rectangle $A$.
Now the question is how can we approximate the area of $f(A)$. Well we approximate our surface $f(A)$ via its tangent plane at the point $(x_0,y_0)$. Then, we have the corresponding vectors \begin{align} \xi_1=\Delta x\cdot\left(1,0,\frac{\partial f}{\partial x}(x_0,y_0)\right) \quad \text{and}\quad \xi_2=\Delta y\cdot\left(0,1,\frac{\partial f}{\partial y}(x_0,y_0)\right) \end{align} which are tangent to the surface $f(A)$ (in other words, we are taking the tangent vectors $(\Delta x,0)$ and $(0,\Delta y)$ to the rectangle $A$ and looking at the corresponding tangent vectors $\xi_1,\xi_2$ on the image $f(A)$). Now, if the rectangle $A$ is small enough, then the plane spanned by these two vectors ought to approximate $f(A)$ well. So, we can approximate the area of $f(A)$ by the area of the parallelogram spanned by the two vectors $\xi_1,\xi_2$. But if you recall, the area of a parallelogram is the absolute value of the cross product: \begin{align} \text{area } f(A) &\approx \text{area spanned by $\xi_1,\xi_2$}\\ &=|\xi_1\times \xi_2|\\ &=\sqrt{1+\left(\frac{\partial f}{\partial x}(x_0,y_0)\right)^2+ \left(\frac{\partial f}{\partial y}(x_0,y_0)\right)^2}\cdot |\Delta x\Delta y| \end{align} In higher dimensions, one would of course need an analogous formula for the $k$-dimensional volume of the parallelepiped spanned by $k$ vectors (this is ultimately related to the gram matrix I mentioned in the comments). Anyway, hopefully this gives you a better intuition for why the surface area of a graph is as such.