I am struggling to understand the details of a proof in Wald's General Relativity, pp. 15-16.
Claim Let $M$ be an $n$-dimensional manifold. Take some point $p\in M$ and let $V_p$ be the tangent space to $M$ at $p$. Then $\dim V_p=n$.
Proof Wald sets out to prove the claim by constructing a basis of $V_p$ and showing it is $n$-dimensional.
Consider the chart $\Phi:O\to U$, where $O$ is a subset of $M$ ($O\subset M$) and $U$ is a subset of $\mathbb{R}^n$ ($U\subset\mathbb{R}^n$).
Consider the map $f\in F$, where $F$ is the collections of $C^\infty$ maps $M\to R$.
By definition, $f\circ\Phi^{-1}:U\to R$ (which is the function that takes a point in $U$ back to $O$ and then to $R$) is also $C^\infty$.
We then define $X_i:F\to\mathbb{R}$, $X_i(f)=\left.\frac{\partial}{\partial x^i}\left(f\circ\Phi^{-1}\right)\right|_{\Phi(p)}$ with $(x^i)$ being the Cartesian coordinates in $\mathbb{R}^n$, for $i=1,\dots,n$.
Then,
$X_1,\dots,X_n$ are tangent vectors, and it is easily seen that they are linearly independent.
1) Are they linearly independent because they are directional derivatives taken along different directions which are mutually perpendicular, since they are simply the Cartesian directions?
Further on, Wald uses the result from calculus that $$F(x)=F(a)+\sum_{i=1}^n(x^i-a^i)H_i(x)\quad\text{where}\quad H_i(a)=\left.\frac{\partial F}{\partial x^i}\right|_{x=a}$$ for $F:\mathbb{R}^n\to\mathbb{R}$, $a,x\in\mathbb{R}^n$.
He then applies this to the present proof by substituting $F=f\circ\Phi^{-1}$ and $a=\Phi(p)$ to get, for any point $q\in O$, (Equation 1) $$f(q)=f(p)+\sum_{i=1}^n\left(x^i\circ\Phi(q)-x^i\circ\Phi(p)\right)H_i(\Phi(q))$$
2) How does this result follow from the above substitution? If I substitute for $F$ and $a$ I get something different.
3) After this substitution, what is $H_i(\Phi(q))$? I would say $H_i(\Phi(q))=\left.\frac{\partial(f\circ\Phi^{-1})}{\partial x^i}\right|_{\Phi(p)}$, but later in the proof Wald says that $H_i\circ\Phi(p)=X_i(f)$, so clearly I am doing something wrong.
4) What is $x^i\circ \Phi(q)$ in Equation 1? What does the composition of $\Phi(q)$ with a Cartesian coordinate $x^i$ mean?
As is common, Wald start by defining the tangent vectors at $p\in M$ as derivations at $p$, that is maps from smooth functions $C^\infty M\to\mathbb{R}$ satisfying a pointwise Leibniz rule. To reiterate, choosing a chart $\Phi$ around $p$, we get coordinate functions $x_1,...,x_n\in C^\infty M$ (an important point is that coordinate functions are just smooth functions on the manifold) and corresponding partial derivatives at $p$, $X_1,...,X_n=\left(\frac{\partial}{\partial x_1}\right)_p,...,\left(\frac{\partial}{\partial x_n}\right)_p$. Of course, one must establish these are indeed derivations at $p$, then show they form a basis.
1) Linear independence of $X_i$ can be established as follows: If we can find a function $f$ such that $X_i(f)\neq 0$ and $X_j(f)=0$ when $j\neq i$, then we are done, since any linear combination of $X_{j\neq i}$ will still vanish when acting on $f$ and thus not be equal to $X_i$. We have such functions, namely $$ X_i(x_j)=\begin{cases} 1 & i=j \\ 0 & i\neq j \end{cases} $$ $\approx$2) A somewhat different, but potentially more straightforward argument is based on the fact that smooth functions can be Taylor expanded [Used in John Lee's Introduction to Smooth Manifolds, Chap. 3]. Choose coordinates so that $\Phi(p)=\vec{0}$, let $f\in C^\infty M$, and $F=f\circ\Phi^{-1}:\mathbb{R^n}\to\mathbb{R}$ be its coordinate representation. We can use Taylor's Theorem to write $$ F(x_1,...,x_n)=F(\vec{0})+\left(\frac{\partial F}{\partial x^i}\right)_{\vec{0}}x^i+x^ix^j\int_0^1(1-t)\frac{\partial^2 F(tx_1,...,tx_n)}{\partial x^i\partial x^j}dt $$ Going back to the manifold, $$ f(q)=f(p)+\left(\frac{\partial}{\partial x^i}(f\circ\Phi^{-1})\right)_{\vec{0}}x^i(q)+x^i(q)x^j(q)\left[\int_0^1(1-t)\frac{\partial^2 (f\circ\Phi^{-1})(tx_1,...,tx_n)}{\partial x^i\partial x^j}dt\right](q) $$ From here, we can show that an arbitrary derivation at $p$ $v_p$ can be written as $v_p=v_p(x^i)X_i$, by acting with both on an arbitrary function written in the form above. (The last term vanishes since we're always evaluating at least one coordinate function at $p$.)