I was first introduced to Chebyshev polynomials (of the first kind) in the form $T_n(x)=\cos\left(n \operatorname{arccos}(x)\right)$. The usual recurrence relation was then derived from using trig identities.
The purpose of Chebyshev polynomials, in this context, is that their roots constitute the best set of points to use in Lagrange interpolation, i.e. they minimize the error function $E(x;f)\triangleq f(x)-P_n(x)$. I am left with an unsatisfied feeling, however, as I realize that although I can verify that this works (and the proof is not hard), I have no idea why Chebyshev might have thought to look at these particular polynomials in the first place to solve this particular problem.
Why might one be inclined to think that polynomials of the form $\cos\left(n \operatorname{arccos}(x)\right)$ would provide the best set of points for Lagrange interpolation?
Boyd (Chebyshev and Fourier Spectral Methods) argues that Chebyshev's polynomials are just Fourier cosine series in disguise, and truncated Fourier series are obviously better suited for interpolation than arbitrary Lagrange polynomials.
Trefethen (Spectral methods with Matlab, CH. 5) has another justification. It is all about the choice of the interpolation nodes. Logarithm of the interpolation polynomial is interpreted as an electrostatic potential and its norm is studied as function of the distribution of roots. He concludes that "Grid points for polynomial spectral methods should lie approximately in a minimal-energy configuration associated with inverse linear repulsion between points" which leads to Chebyshev series.
I like to think that if we approximate a slowly varying function on a bounded interval by polynomials, we might as well choose as basis the set of monic polynomials with minimal $L^\infty$ norm over $[-1,1]$, i.e., Chyebyshev.