Let $M$ be a complete Riemannian manifold and let $N \subset M$ be a closed submanifold of $M$. Let $p_0 \in M$ where $p_{0} \notin N$ and let $d(p_{0},N)$ be the distance from $p_{0}$ to $N$.
Show that there exists a point $q_{0} \in N$ such that $d(p_0,q_0)=d(p_{0},N)$.
Moreover that a minimizing geodesic which joins $p_{0}$ to $q_{0}$ is orthogonal to $N$ at $q_{0}$.
I've done the first part but am at a total loss for the second part (highlighted). This comes from chapter 9 of do Carmo on variations of energy so I assume you have to use a variation somehow.
Here is a solution in full detail. Note that we aren't assuming compactness of N, only that N is closed in M, hence the geodesic argument for step 1:
Step 1:
Let $q_i$ be a sequence of points in $N$ such that $d(p_0,q_i) \rightarrow d(p_0,N)$ and let $\gamma_i : [0,\ell] \rightarrow M$ be a normalized geodesic from $p_0$ to $q_i$ for each $i$. Write $\gamma_i(t)=\exp_{p_0}(t\gamma_i'(0))$. Then $\gamma_i'(0) \subset S^n \subset T_{p_{0}}M$ is a sequence of vectors in a compact set thus has a convergent subsequence, so we may assume $\gamma_i'(0) \rightarrow v \in S^n \subset T_{p_{0}}M$. Since $\gamma_i(t)=\exp_{p_{0}}(tv)$ we have $\gamma_i(t) \rightarrow \exp_{p_{0}}(tv)$. Let $\gamma(t)=\exp_{p_{0}}(tv)$ so that $\gamma_i \rightarrow \gamma$. In particular $\gamma_i(\ell) \rightarrow \gamma(\ell)$; that is $q_i \rightarrow \gamma (\ell)$. Because $N$ is closed, we have $N$ is also complete, hence $\gamma(\ell) \in N$. By continuity of the metric function $d(p_0, \cdot)$, we have $d(p_0,q_i) \rightarrow d(p_0,\gamma(\ell))$. But $d(p_0,q_i) \rightarrow d(p_0,N)$ by assumption, so $d(p_0,\gamma(\ell))=d(p_0,N)$. Thus $\gamma(\ell)$ is the desired point.
Step 2:
Assume that a normalized geodesic $\gamma: [0,\ell] \rightarrow M$ with $\gamma(0)=p_0$ and $\gamma(\ell)=q_0$ is not orthogonal to $N$; that is, $\gamma'(l)\notin (T_{q_{0}}N)^{\perp} \subset T_{q_{0}}M$. We wish to construct a variation $f(s,t)$ of $\gamma$ with the following properties:
(i) f starts at $p_0$ and ends inside of $N$
(ii) Near $\gamma$ there is a shorter curve in the variation
Let $v \neq 0$ be the projection of $\gamma'(\ell)$ onto $T_{q_{0}}N$. We can assume $\langle v,\gamma'(\ell) \rangle <0$ (otherwise use $-v$ instead of $v$). Choose a curve $\sigma:(-\epsilon, \epsilon) \rightarrow N$ such that $\sigma(0)=q_0$ and $\sigma'(0)=v$.
Step 3:
Here we assume that $q_0$ lies inside a normal neighborhood of $p_0$. This condition implies that there exists $U\subset T_{p_{0}}M$ where $\exp_{p_{0}}: U \rightarrow \exp_{p_{0}}U$ is a diffeomorphism. Let $\lambda(s)=\exp^{-1}_{p_{0}}(\sigma(s))$ be the curve in $U$ that is the pullback of $\sigma$ by $\exp_{p_{0}}$; note that we may assume $\sigma$ lies entirely in the normal neighborhood by shrinking $\epsilon$ if necessary. Define
$f(s,t)=\exp_{p_{0}}(\frac{t}{\ell}\lambda(s))$
where $s \in (-\epsilon,\epsilon)$ and $t \in [0,\ell]$.
Observe that $f$ has the following properties:
(a) $f(s,0)=p_0$
(b) $f(s,t)$ is a variation of $\gamma$
(c) $f(s,\ell)=\sigma(s) \in N$
The properties (a) and (c) are obvious and for (b) note that $\gamma(t)=\exp_{p_{0}}(t\gamma'(0))$. Hence $q_0=\exp_{p_{0}}(\ell \gamma'(0)) \implies \exp^{-1}_{p_{0}}(q_{0})=\ell \gamma'(0)$. Hence we have $f(0,t)=\exp_{p_{0}}(t\gamma'(0))=\gamma(t)$.
Note also that for fixed $s$ that $f(s,t)$ is a geodesic.
We will now show that $E'(0)<0$ where $E$ is the energy function of the variation. Indeed, by the first variation of energy we have
$\frac{1}{2}E'(0)=-\langle V(0), \gamma'(0) \rangle +\langle V(\ell), \gamma'(\ell) \rangle$
where $V(t)=\frac{\partial f}{\partial s}(0,t)$ is the variational field. Note that $V(0)=0$ because $f$ is pinched at $p_0$; so the first term of the equation above vanishes. On the other hand, $V(\ell)=\sigma'(0)=v$, so $\frac{1}{2}E'(0)=\langle v, \gamma'(\ell) \rangle<0$ as desired.
Step 4:
We show that for the variation $f(s,t)$ of the geodesic $\gamma$ that having $E'(0)<0$ implies there is a shorter curve nearby. Indeed, since $\gamma$ is a geodesic and since $E'(0)<0$ implies $E$ is strictly decreasing in a neighborhood of $0$, we have for $\tilde{\epsilon}$ small ($L$ is the length function)
$L(\tilde{\epsilon})^2=\ell E(\tilde{\epsilon})<\ell E(0)=L(0)^2$
hence
$L(\tilde{\epsilon})=$ length of $f(\tilde{\epsilon},t) < L(0)=$ length of $f(0,t)=\gamma(t)$
This proves the assertion in the case $q_0$ lies in a normal neighborhood of $p_0$ because we have found a curve from $p_0$ to a point $q$ of $n$ of length less than $\ell(\gamma)=d(p_0,q_0) = d(p_0,N)$ contradicting that $\gamma$ minimizes the distance from $p_0$ to $N$.
Step 5:
It remains to show that the assertion still holds when $q_0$ does not lie in a normal neighborhood of $p_0$. In this case, choose a point $p_1$ lying on $\gamma$ so close to $q_0$ that $q_0$ does lie in a normal neighborhood of $p_1$. Then by step 3, we can use a variation of the geodesic segment of $\gamma$ from $p_1$ to $q_0$ to construct a variation of this geodesic producing a geodesic $c$ from $p_1$ to some point $q_1 \in N$ with length less than the length of $\gamma$ from $p_1$ to $q_0$. Then the broken curve along $\gamma$ from $p_0$ to $p_1$ followed by the geodesic $c$ from $p_1$ to $q_1$ has length less than $\ell(\gamma) = d(p_0,N)$. This contradicts that $\gamma$ minimizes the distance from $p_0$ to $N$.