I was reading the following theorem on Apostol's Mathematical Analysis (page 357):
Assume that one of the partial derivatives $D_1f\dots D_nf$ exists at $\mathbf c$ and that the remaining $n−1$ partial derivatives exists in some $n$-ball $B(\mathbf c)$, and are continuous at $\mathbf c$. Then $f$ is differentiable at $\mathbf c$.
The proof offered shows that $f(\mathbf{c}+ \mathbf{h}) - f(\mathbf{c}) = \nabla f(\mathbf{c}) \cdot \mathbf{h} + o(\lvert \lvert \mathbf h \rvert \rvert)$ as $\mathbf h$ goes to $0$. To do so it keeps $\mathbf h$ small enough to be inside the $n$-ball where the $n−1$ partial derivatives exist, it then writes the LHS as:
$$f(\mathbf{c}+ \mathbf{h}) - f(\mathbf{c}) = \sum_{k=1}^{n} f(\mathbf{c}+ \mathbf{h}_k) - f(\mathbf{c}+ \mathbf{h}_{k-1})$$
Where $\mathbf h_k$ is the vector $\left( h_1, h_2, \dots , h_k, 0, \dots,0\right)$ and $\mathbf h_0 = \mathbf 0$
Without loss of generality we can assume that the first partial derivative isn't continuous while the following $n-1$ are continuous at $\mathbf c$.
The proof continues like this:
The first term in the sum is $f(\mathbf{c}+ \mathbf{h}_1) - f(\mathbf{c})$, since the two points only differ in their first component and $D_1f(\mathbf c)$ exists we can write
$$f(\mathbf{c}+ \mathbf{h}_1) - f(\mathbf{c}) = D_1f(\mathbf c)h_1 + E_1(\lvert \lvert \mathbf h \rvert \rvert)h_1$$ Where $E_1(\lvert \lvert \mathbf h \rvert \rvert)$ goes to $0$ as $\mathbf h$ goes to $\mathbf 0$.
All the following terms are written in the same form but with a different motivation. Because the following $n-1$ partial derivatives are continuous he uses the Mean-Value Theorem to show that as $\mathbf h$ goes to $\mathbf 0$ the sum becomes
$$\sum_{k=1}^{n} D_kf(\mathbf{c})h_k + E_k(\lvert \lvert \mathbf h \rvert \rvert)h_k$$
Thus concluding the proof.
Question 1: Why do we only need the $n-1$ partial derivatives to be continuous at $\mathbf c$ rather than in a neighborhood of $\mathbf c$? I don't think we can apply the Mean-Value Theorem between the points $\mathbf c + \mathbf h$ and $\mathbf c$ if the derivatives aren't continuous between those two points. Does continuity at a point imply continuity in a neighborhood of the point for derivatives? Because it isn't the case for normal functions. Or maybe we can apply the the Mean-Value Theorem even when we only have continuity of the derivative in just one point?
Question 2: Why are we using the Mean-Value Theorem for the last $n-1$ derivatives when we didn't need to for the first one? If the first equality holds why can't we do the same with all the following derivatives? This way we wouldn't even need the continuity of the $n-1$ derivatives to prove differentiability.(I know that's not the case)
I am aware that Apostol's proof is correct, I just don't understand those two things. Any help would be greatly appreciated.
No, continuity of derivative at a point does not imply continuity on a neighbourhood. Indeed, just extend the $f\colon\mathbb{R}\to\mathbb{R}$ counterexample to $F\colon\mathbb{R}^n\to\mathbb{R}$ where $F(x_1,\dots,x_n)=f(x_1)+\dots+f(x_n)$ would work.
The proof only needs the derivatives $$D_2f(\mathbf{c}+\mathbf{h}_1+\theta_2 h_2\mathbf{e}_2),D_3f(\mathbf{c}+\mathbf{h}_2+\theta_3 h_3\mathbf{e}_3),\dots,D_nf(\mathbf{c}+\mathbf{h}_{n-1}+\theta_n h_n\mathbf{e}_n)$$ are close to (i.e., $o(1)$ away from) their counterpart at $\mathbf{c}$ as $\mathbf{h}\to 0$, which is guaranteed by continuity of $D_2f,D_3f,\dots,D_nf$ at $\mathbf{c}$.
Recall the (1-dimension) MVT doesn't need continuity of derivative, only differentiability.
If you try to use the same version for $D_2f,\dots,D_nf$ as $D_1f$, i.e., $$ \begin{align*} f(\mathbf{c}+\mathbf{h}_2)-f(\mathbf{c}+\mathbf{h}_1)&=D_2f(\mathbf{c}+\mathbf{h}_1)h_2+E_2h_2\\ f(\mathbf{c}+\mathbf{h}_3)-f(\mathbf{c}+\mathbf{h}_2)&=D_3f(\mathbf{c}+\mathbf{h}_2)h_3+E_3h_3\\ &\vdots\\ f(\mathbf{c}+\mathbf{h})-f(\mathbf{c}+\mathbf{h}_{n-1})&=D_nf(\mathbf{c}+\mathbf{h}_{n-1})h_n+E_nh_n \end{align*} $$ where $E_k\to 0$ as $h_k\to 0$. There is a slighly nasty detail hidden in the $E_k$ -- those terms have an implicit dependence on $\mathbf{h}_{k-1}$ and we need to somehow bound them uniformly for all choice of $\mathbf{h}_{k-1}$ (for every $\varepsilon>0$ and every $\mathbf{h}_{k-1}$, we know there is a $\delta(\mathbf{h}_{k-1})>0$ such that if $\lvert h_k\rvert<\delta(\mathbf{h}_{k-1})$ we have $\lvert E_k\rvert<\varepsilon$, but we don't know there is a single $\delta$ that works for all $\mathbf{h}_{k-1}$ sufficiently small). The problem does not appear for $D_1$ because we are always taking derivative at $\mathbf{c}$ and so a uniform bound can be found for $E_1$. The use of MVT gives equality (so no need to bound there), and then continuity of $D_k$ means we have a uniform bound.