The finite discrete approximation of the 2D Laplacian at a point $(x,y)$ is given by
$$\tag{1}\Delta f(x,y) \approx \frac{f(x-h,y) + f(x+h,y) + f(x,y-h) + f(x,y+h) - 4f(x,y)}{h^2}$$
As I understand from this thread, in 3D, the Laplacian at a point is proportional to the difference between the average value over a small sphere of our scalar function and the value at the very center of the sphere, i.e, $$\Delta f(p)\propto [\textrm{average of}~ f ~\textrm{over a sphere centered at p} - f(p)].$$
And so my intuition, and please correct me if I'm wrong, says in 2D, the Laplacian at a point $p$ should be also proportional to the difference between the average of $f$ over a circle centered at $p$ and $f(p)$, i.e.,
$$\Delta f(p)\propto [\textrm{average of}~ f ~\textrm{over a circle centered at p} - f(p)].$$
But then how does Eq. $(1)$ fit within this context? The approximation is obviously over a square centered at $(x,y)$, but I can't really see the "average over square - value at center of square" idea. What is the intuition behind Eq. $(1)$?
Taylor expansion, pretty much. You find that the $f_x$ terms cancel out because of the coefficients of $f(x+h,y)$ and $f(x-h,y)$ being equal. Same deal for the $f_y$ terms. There are no mixed partial terms appearing because you stay on axis-aligned lines through $(x,y)$. This leaves only the unmixed second partials to show up to within second order, and again they show up "equally", so the second order Taylor expansion of $f(x+h,y)+f(x-h,y)+f(x,y+h)+f(x,y-h)-4f(x,y)$ about $h=0$ is proportional to $\Delta f(x,y) h^2$ (with a constant that doesn't depend on $p,f$ or $h$).