Here is the problem and its solution

I have three questions:
(1 ) I don't understand how he establishes that the second condition holds. Here is my main question, for a given $\epsilon$ I have an $m$ which tells me how the functions with $n$ greater than $m$ behave, but this doesn't tell me anything about the behaviour of the preceding terms of the sequence. More precisely, I think the condition says that whenever the terms of the sequence that are after the $n$th term are within an $\epsilon$ range of the limit function, the first terms of the sequence up to $n$ are ALL within a $\delta$ range of the limit function. My problem is with the "ALL" part, I don't see how we can guarantee this. Can I use the fact that they are all continuous on a compact set and are therefore bounded then take the largest number $A$ among their maxima and claim that all the functions are within a range of $A-a$ of the limit where $a$ is the minimum of the limit function?
(2) I think the main idea in the second part of the proof is to exploit the compactness of $S$ and decompose it into a finite number of neighborhoods. A number $N$ is associated to each of these neighborhoods and we choose the $N$ that guarantees that this $\epsilon$ works for all $S$. But I don't understand what exactly the $N$ in the proof is.
(3) Why can't we say that condition (2) implies uniform convergence directly? It's obvious that it means that the tail of the sequence becomes arbitrarily close to the limit function and within an $\epsilon$ range of it for all $x$ in $S$.
(1) Read condition (ii) carefully: it essentially says that given $|f_k(x)-f(x)|<\delta$ for $k=1,2,...$, then $|f_{k+n}(x)-f(x)|<\varepsilon$ for $n>m$. We don't need the check whether the first terms of the sequence are close to $f(x)$ -- instead, we assume this and then prove that everything after the word "implies" holds. In the $(\Rightarrow)$ direction of the proof, condition (ii) follows directly from uniform convergence: if there exists $N\in \mathbb N$ such that for $n\geq N$ we have $$|f_n(x)-f(x)|<\varepsilon,$$ then we can note that $N\leq n+1,n+2,...n+k$ for any $k\in \mathbb N$, and thus $$|f_{k+n}(x)-f(x)|<\varepsilon$$ for any such $k$, meaning that in (ii), $|f_{k+n}(x)-f(x)|<\varepsilon$ is always true regardless of whether $|f_{k}(x)-f(x)|<\delta$ is. Therefore, the implication $$|f_{k}(x)-f(x)|<\delta ~~\implies~~ |f_{k+n}(x)-f(x)|<\varepsilon$$ is true. See e.g. the truth table for $\implies$: even if the inequality with $\delta$ is false, the one with $\varepsilon$ would still be true. This is why we can pick $N=m$ and $\varepsilon=\delta$.
(2) Yes, the idea is to note that $\bigcup_{x\in S} B(x)$ is an open covering of $S$, which by compactness has a finite sub-covering $\bigcup_{k=1}^p B(x_k)$. Then we can write $$S=S\cap\left( \bigcup_{k=1}^p B(x_k) \right)=\bigcup_{k=1}^p (B(x_k)\cap S).$$
The definition of $N$ contains a mysterious term, $k(x_p)$ (the $p$ is a typo in the solution you provided -- read on). The author's $k(x_0)$ notation is not explicitly defined, but it stems from the pointwise convergence of $(f_n)$: given any $\delta>0$ and $x_0\in S$ we can pick $k(x_0)\in \mathbb N$ such that for $k\geq k(x_0)$ we have $$|f_k(x_0)-f(x_0)|<\delta,$$ and in particular $|f_{k(x_0)}(x_0)-f(x_0)|<\delta$. Then by continuity (due to condition (i) and the continuity of $f_{k(x_0)}$) we can extend this to $|f_{k(x_0)}(x)-f(x)|<\delta$ for some $x\in B(x_0)$, and by (ii): $$|f_{k(x_0)+n}(x)-f(x)|<\varepsilon, ~~~~~(\ast)$$ for all $x\in B(x_0)\cap S$ (since we can only guarantee that $|f_{k(x_0)}(x)-f(x)|<\delta$ if $x$ is in this neighbourhood $B(x_0)$) and $n>m$. However, we already know that S has a finite covering $\bigcup_{k=1}^p (B(x_k)\cap S)$, and thus any $x\in S$ belongs to some set $(B(x_k)\cap S)$, and thus verifies the inequality $(\ast)$ for $x_k=x_0$. This finally brings us to the definition of $N$: it's the number $$N=\max_{1\leq i \leq p} k(x_i)+m,$$ which is picked conveniently because if $n\geq N$, then the inequality $(\ast)$ is true for all $x\in S$ regardless of which set $(B(x_k)\cap S)$ they belong to. Thus, this choice of $N$ depends only on $\varepsilon$ and not $x$, and the convergence is uniform.
(3) It is not entirely obvious: the existence of the "tail" as you call it is guaranteed only if the condition $|f_k(x)-f(x)|<\delta$ holds. On another hand, the convergence is uniform if given $\varepsilon>0$ we can pick a natural number $N\in \mathbb N$ independently from $x\in S$, such that $f_n(x)$ is close to $f(x)$. How do we choose $N$ knowing only (i) and (ii)? The answer is of course in the proof. ;)
I did my best to explain in detail; let me know if you'd like me to clarify something.