While searching on MSE, I couldn't find a complete rigorous proof the method of Lagrange multipliers using the implicit function theorem. I tried to write a complete proof myself below, but am not sure about some details, mainly the reordering of coordinates part. Any advice is very appreciated.
Theorem. Let $f:\mathbb{R}^d\to\mathbb{R}$ and $h:\mathbb{R}^d\to\mathbb{R}^n$ be continuously differentiable functions, $C\in\mathbb{R}^n$, and $M=\{h=C\}$. Assume that $\text{rank } h'(x)=n$ for all $x\in M$. If $f$ attains a constrained local extremum at $a$, subject to the constraint $h(a)=C$, then there exists $\lambda_1,\dots,\lambda_n\in\mathbb{R}$ such that
$$\nabla f(a)=\lambda_1\nabla h_1(a)+\dots+\lambda_n\nabla h_n(a), \quad \quad (1)$$
where $\nabla f(a)=[\frac{\partial f}{\partial x_1} (a),\dots,\frac{\partial f}{\partial x_d} (a)]^\top$ and $\nabla h_i(a)=[\frac{\partial h_i}{\partial x_1} (a),\dots,\frac{\partial h_i}{\partial x_d} (a)]^\top$ for $i=1,\dots,n$.
Proof. Suppose $f$ attains a contrained local extremum at $a$. First note that $h'(a)=[\nabla h_1(a),\dots, \nabla h_n(a)]^\top$ is $n\times d$, and so $\text{rank } h'(x)=n$ implies that $n\leq d$ and that $h'(a)$ has $n$ linearly independent columns. If $n=d$, then $h'(a)$ is invertible, and we can write $f'(a)=\Lambda h'(a)$ with $\Lambda:=f'(a)[h'(a)]^{-1}:=[\lambda_1,\dots,\lambda_n]$. Transposing gives $(1)$. Hence assume WLOG that $n<d$.
Next we claim that we can assume WLOG that the first $n$ columns of $h'(a)$ are linearly independent. Otherwise there exists a permutation $\pi:\{1,\dots,d\}\to \{1,\dots,d\} $ that reorders the columns of $h'(a)$ so that it holds. Define $f^*:\mathbb{R}^d\to\mathbb{R}$ and $h^*:\mathbb{R}^d\to\mathbb{R}^n$ by
$$f^*(x_1,\dots,x_d)=f(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)}), \quad h^*(x_1,\dots,x_d)=h(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)})$$
Then we have
$$\frac{\partial f^*}{\partial x_j}(x_1,\dots,x_d)=\frac{\partial f}{\partial x_{\pi(j)}}(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)}) \quad 1\leq j\leq d,$$
$$\frac{\partial h_i^*}{\partial x_j}(x_1,\dots,x_d)=\frac{\partial h_i}{\partial x_{\pi(j)}}(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)}) \quad 1\leq i\leq n, \quad 1\leq j\leq d$$
and so the continuous differentiability of $f,h$ implies the continuous differentiability of $f^*,h^*$. Let $a^*=(a_{\pi(1)},\dots,a_{\pi(n)})$. Then
$$\frac{\partial f^*}{\partial x_j}(a^*)=\frac{\partial f}{\partial x_{\pi(j)}}(a) \quad 1\leq j\leq d,$$ $$\frac{\partial h_i^*}{\partial x_j}(a^*)=\frac{\partial h_i}{\partial x_{\pi(j)}}(a) \quad 1\leq i\leq n, \quad 1\leq j\leq d$$
It follows that the columns of $f^{*'}(a^*)$ and $h^{*'}(a^*)$ are those of $f'(a)$ and $h'(a)$ permuted according to $\pi$. Moreover we see that $f^*$ attains a constrained local extrumum at $a^*$, subject to the constraint $h^*(a^*)=C$. Since condition $(1)$ for $f,h$ at $a$ is equivalent to condition $(1)$ for $f^*,h^*$ at $a^*$, we have reduced the problem to the case where the first $n$ columns of $h'(a)$ are linearly independent.
As a last reduction, we may assume WLOG that $C=0$, for otherwise we can replace $h$ by $h-C$.
Let $m=d-n$ and denote points in $\mathbb{R}^d=\mathbb{R}^{n+m}$ by $(x,y)$ with $x\in\mathbb{R}^n$ and $y\in\mathbb{R}^m$. Then $h$ satisfies the conditions of the implicit function theorem at the point $a=(a_x,a_y)$, as found in Rudin PMA Theorem 9.27 for example. Therefore, there exists open sets $U\subset \mathbb{R}^{n+m}$ and $W\subset \mathbb R^m$ with $(a_x,a_y)\in U$ and $a_y\in W$, and a continuously differentiable function $g:W\to\mathbb R^n$ such that $g(a_y)=a_x$ and $h(g(y),y)=0$ for all $y\in W$. In particular we have
$$\{(g(y),y):y\in W\}\subset M \quad \quad (2)$$
Define the function $F:W\to \mathbb R$ by $F(y)=f(g(y),y)$. This functions is differentiable on $W$, and the chain rule gives $F'(y)=f'(g(y),y)\begin{bmatrix} g'(y)\\ I_m \end{bmatrix}$. Since $a_y\in W$ and $g$ is continuous at $a_y$, it follows from $(2)$ that $F$ has an unconstrained local extremum at $a_y$. For if there exists an $r>0$ such that $f|_{M\cap B_r(a)}$ has an unconstrained extremum at $a$, then by choosing $\delta>0$ sufficiently small we obtain $(g(y),y)\in M\cap B_r(a)$ whenever $y\in W\cap B_{\delta}(a_y)$. Using the fact that the derivative must vanish at a local extremum, and partitioning $f'(a)=[f_x'(a) \quad f_y'(a)]$, we get
$$0=F'(a_y)=[f_x'(a) \quad f_y'(a)]\begin{bmatrix} g'(a_y)\\ I_m \end{bmatrix}=f_x'(a)g'(a_y)+ f_y'(a) \quad \quad (3)$$
Next consider the function $H:W\to\mathbb R^{n+m}$ defined by $H(y)=h(g(y),y)$. This functions is differentiable on $W$, and the chain rule gives $H'(y)=h'(g(y),y)\begin{bmatrix} g'(y)\\ I_m \end{bmatrix}$. Moreover we have $H=0$ by definition of $g$, and so in particular
$$0=H'(a_y)=[h_x'(a) \quad h_y'(a)]\begin{bmatrix} g'(a_y)\\ I_m \end{bmatrix}=h_x'(a)g'(a_y)+ h_y'(a) \quad \quad (4)$$
where we partioned $h'(a)=[h_x'(a) \quad h_y'(a)]$. Since the first $n$ columns of $h'(a)$ are linearly independent, $h_x'(a)$ is invertible, and from $(4)$ we get
$$g'(a_y)=-[h_x'(a)]^{-1}h_y'(a) \quad \quad (5)$$
Substituting $(5)$ in $(3)$ and solving for $f_y'(a) $ we obtain
$$f_y'(a) =\Lambda h_y'(a) \quad \quad (6)$$
where $\Lambda:=f_x'(a)[h_x'(a)]^{-1}:=[\lambda_1,\dots,\lambda_n]$. But we also have
$$f'_x(a)=f'_x(a)[h_x'(a)]^{-1}h_x'(a)=\Lambda h_x'(a) \quad \quad (7)$$
so combining $(6)$ and $(7)$ gives
$$f'(a)=[f'_x(a) \quad f'_y(a)]=\Lambda [h_x'(a)\quad h_y'(a)]=\Lambda h'(a)$$
Transposing gives $(1)$, as desired.