So if a function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ constrained to the surface $g(x)=c$ for $x\in\mathbb{R}^n$ has a local maximum at $P$, then I'm having trouble seeing how this implies that the gradients of $f$ and $g$ at $P$ point in the same direction.
My intuitive understanding is that there exist $n-1$ orthogonal directions in $\mathbb{R}^n$ which are perpendicular to $\triangledown g$, and thus either $g$ is locally constant in all of these directions or has a local extrema in some of these directions.
If $g$ is locally constant in all of these directions, then $f$ must have a derivative of zero in all of these directions and thus we conclude that the gradients of $f$ and $g$ are parallel, but what if $g$ has a local extrema in some of these directions? then there is no need for the derivative of $f$ to be zero in those directions, and thus $\triangledown f$ need not be perpendicular to those directions.
Is this something that doesn't happen for functions with continuous partial derivatives?
It all boils down to one thing being tangent to another.
Let's take a really simply example. Let's try to find the extrema of the the function $\mathrm{f} : \mathbb{R}^2 \to \mathbb{R}$, given by $\mathrm{f}(x,y) = x+y$, subject to the constraint $\mathrm{g}(x,y)=0$ where $\mathrm{g}(x,y)=x^2 + 4y^2 - 1$.
The level-sets $\mathrm{f}^{-1}(v) = \{ (x,y) \in \mathbb{R}^2 : \mathrm{f}(x,y)=v\}$ are given by the lines $x+y=v$. The constraint $x^2+4y^2=1$ means that we're interested in the ellipse shown below. I've also included the lines $x+y=v$ where $v=0, \pm 0.5, \pm 1, \pm 1.5$. Keep reading below the picture!
The minima (resp. maxima) are given by the lines $x+y=v$ which have the smallest (resp. largest) possible values for $v$ and which still meet the ellipse, i.e. satisfy the constraint. These are exactly the tangent lines! See below.
The level-sets were given by the equations $\mathrm{f}(x,y)=v$ and so the gradient $\nabla\mathrm{f}$ gives a vector perpendicular to the level-sets. The ellipse was given by the equation $\mathrm{g}(x,y)=0$ and so the gradient $\nabla\mathrm{g}$ gives a vector perpendicular to the ellipse. Since both $\nabla\mathrm{f} \neq {\bf 0}$ and $\nabla\mathrm{g} \neq {\bf 0}$ it follows that the ellipse and the lines are tangent if and only if $\nabla\mathrm{f}$ and $\nabla\mathrm{g}$ are parallel.
The same idea holds for higher dimensions and more complicated functions.
Having $\nabla\mathrm{f}$ parallel to $\nabla\mathrm{g}$ implies tangency.