I'm approaching differential geometry from a physicist's perspective in the hope of understanding GR more thoroughly.
I've been told that, intuitively, the tangent space $T_{p}M$ to a point $p$ on a manifold $M$ is "the best linear approximation to the manifold $M$ at that point". What is meant by this?
Is it meant in the sense that the tangent vectors at that point provide the best linear approximation of functions on the manifold at that point? Does this extend for a sufficiently small neighbourhood around a given point?
In the context of GR, is this a mathematical implementation of the equivalence principle, in the sense that the $T_{p}M$ is flat and so the laws of physics are those of special relativity (SR) on $T_{p}M$. The laws of physics on $M$ are therefore SR for a sufficiently small neighbourhood of $M$ around a given point?
This statement is meaningful if your manifold is embedded in an higher dimensional Euclidean space $\mathbb{R}^{n}$. It is very much like a linear tangent to a $1D$ curve. For example, see this illustration from Wikipedia of a tangent plane to a sphere
Lets look at manifolds embedded in $\mathbb{R}^{3}$. If, for example, you can write your manifold as $z=f\left(x,y\right)$ around $p$, and you expand this function into its Taylor series
$$z=f\left(x,y\right)\approx f\left(p\right)+\nabla f\left(p\right)\cdot\left(\left(x,y\right)-p\right)+\dots$$
then
$$z=f\left(p\right)+\nabla f\left(p\right)\cdot\left(\left(x,y\right)-p\right)$$
is the tangent place. $T_{p}M$ is this plane as a vector space.
Yes. It means that by changing coordinate system at a point, your metric can be transformed into the Minkowski metric $\eta_{\mu\nu}$.