Definition: let $S,S'\subset\mathbb{R}^3$ be surfaces and a diferentiable function $f:S\to S'$. $f$ is said to be an isometry if for every $p\in S$ we have $\langle df_p(v), df_p(w)\rangle = \langle v,w\rangle$ whenever $v, w\in T_pS$
I'm trying to prove that if $f:\mathbb{S}^2\to\mathbb{S}^2$ is an isometry, then $f$ is an orthogonal linear transformation.
What I've done so far was to prove that $df_p$ is an orthogonal linear transformation for every $p\in S$. But I really don't know how to conclude that $f$ it self is an orthogonal transformation.
What is the trick?
Since $f : S^2 \to S^2$ is an isometry, it maps geodesics to geodesics. In other words, $f$ maps great circles to great circles on $S^2$. Moreover, being an isometry, $f$ preserves the affine separation between points on geodesics, which here is equal to the angular separation between points on great circles.
Now let's extend $f : S^2 \to S^2$ to a map $\hat f : \mathbb R^3 \to \mathbb R^3$, given by $$ \hat f : \ t \mathbf x \mapsto tf(\mathbf x) \ \ \ \ \ \ \ \ \ \ {\rm for \ \ } t \geq0, \ \ \mathbf x \in S^2.$$ Using our observations above, it should be easy to see that $\hat f$ is a linear map. For example:
Given two non-antipodal points $t_1 \mathbf x_1$ and $t_2 \mathbf x_2$ in $\mathbb R^3$, there is a unique great circle $C$ on $S^2$ passing through $\mathbf x_1$ and $\mathbf x_2$. The sum $t_1 \mathbf x_1 + t_2 \mathbf x_2$ can be written as $t\mathbf x$ for some $t \geq 0$ and for some $\mathbf x$ on this great circle $C$. The linearity property $\hat f(t_1 \mathbf x_1) + \hat f(t_2 \mathbf x_2) = \hat f(t \mathbf x)$ then follows from the fact that $f$ maps the great circle $C$ to another great circle $f(C)$, and preserves angular separations on these great circles. The case where $t_1 \mathbf x_1$ and $t_2 \mathbf x_2$ are antipodal is even easier...
Similarly, the property $\hat f(- t\mathbf x) = - \hat f(t \mathbf x)$ follows from the fact that $f$ maps antipodal points to antipodal points. It then follows that $\hat f( \alpha t \mathbf x) = \alpha \hat f(t \mathbf x)$ for all $\alpha$, positive or negative.
Thus $\hat f$ is a linear map on $\mathbb R^3$. By construction, $\hat f$ also preserves separation from the origin. Hence $\hat f$ is represented by an orthogonal matrix. Restricting $\hat f$ to $S^2$, we conclude that $f$ is also represented by an orthogonal matrix.
[Edit: Yes, the claim that $f$ maps geodesics to geodesics does follow from your local definition of an isometry. A geodesic $t \mapsto \mathbf x (t) \in S^2$ is a path from $\mathbf x(t_0)$ to $\mathbf x(t_1)$ that minimises the length functional $$ L[\mathbf x(t)] = \int_{t_0}^{t_1} dt \sqrt{\left\langle \frac{d \mathbf x(t)}{dt}, \frac{d \mathbf x(t)}{dt} \right\rangle} $$ locally within the space of paths between $\mathbf x(t_0)$ and $\mathbf x(t_1)$. (Think of this as a calculus of variations statement.)
But $f$ is an isometry, so it preserves infinitesimal lengths, and therefore, one would expect that $f$ preserves lengths of paths. Let's check this: \begin{multline} L[f(\mathbf x(t))] = \int_{t_0}^{t_1}dt \sqrt{\left\langle \frac{d f(\mathbf x(t))}{dt}, \frac{d f(\mathbf x(t)}{dt} \right\rangle}\\ = \int_{t_0}^{t_1}dt \sqrt{\left\langle df_{\mathbf x(t)}\left(\frac{d (\mathbf x(t))}{dt}\right), df_{\mathbf x(t)}\left(\frac{d (\mathbf x(t)}{dt}\right) \right\rangle} \\ = \int_{t_0}^{t_1} dt \sqrt{\left\langle \frac{d \mathbf x(t)}{dt},\frac{d \mathbf x(t)}{dt} \right\rangle} = L[\mathbf x (t)]\end{multline} Great! So we can conclude that $t \mapsto \mathbf x(t)$ is a geodesic if and only if its image $t \mapsto f(\mathbf x(t))$ is a geodesic. In the context of this problem, the geodesics are precisely the great circles on $S^2$, so $f$ maps great circles to great circles.
Furthermore, the affine length of a geodesic between its endpoints is given by $L[\mathbf x(t)]$. Since $L[f(\mathbf x (t))] = L[\mathbf x(t)]$ when $f$ is an isometry, we conclude that affine lengths are preserved by isometries. In the context of this question, the affine length between between points on a great circle is equal to their angular separation in radians. ]