All answers to this problem involve the following process:
Pick a point $a$ in the unit disk such that $T(a)=0$. Then, $T(a^*)=\infty$ if $a^*$ is the symmetric point $a^*=\frac{1}{\bar{a}}$ of a. Clearly, such a transformation is of the form $T(z)=\frac{z-a}{1-z\bar{a}}$. We might as well place a constant C in there: $T(z)=C\frac{z-a}{1-z\bar{a}}$. Then, most answers magically jump to replacing $C=e^{i\theta}$ in the formula and claiming the job is done.
Why does a function built so that it sends a point inside the unit disk to zero, and its symmetric point to $\infty$, be the one that preserves the unit disk? What's the motivation behind approaching this problem in that manner? Could one proceed bruteforce: with $|\frac{az+b}{cz+d}|<1$ and arrive at the same function? For some reason this seems to me like somebody pulled a rabbit out of a hat. Once given the formula, I understand why it does the job, but why would somebody think that choosing a point inside the disk and building a function that sends it to zero create a mobius transformation that does the job?
Define $T_a(z) = \frac{z-a}{\overline{a}z - 1}.$ This maps points in the unit disc to points in the unit disc, since $|z| < 1$ implies $$|z-a|^2 - |1 - \overline{a}z|^2 = (|z|^2 - 1)(1 - |a|^2) < 0$$ and therefore $$|T_a(z)|^2 = \frac{|z-a|^2}{|\overline{a}z - 1|^2} < 1.$$
Also, you can check that $T_a$ is its own inverse: $T_a(T_a(z)) = z.$
Given any other automorphism $f$ of the unit disc, $T_{f(0)} \circ f$ is an automorphism of the unit disc with $T_{f(0)} \circ f(0) = 0,$ so Schwarz's lemma implies $T_{f(0)} \circ f(z) = Cz$ for some constant $C$ with $|C| = 1,$ i.e. $C = e^{i \theta}.$ Therefore, $$f(z) = T_{f(0)} \circ T_{f(0)} \circ f(z) = T_{f(0)}(Cz) = C \frac{z - f(0) / C}{1 - \overline{f(0)} Cz} = C \frac{z-a}{\overline{a}z - 1}$$ with $a = \frac{f(0)}{C}.$
The intuition behind $T_a$ involves replacing $\mathbb{C}$ by projective space: using coordinates $[z_1:z_2]$, where $z_1,z_2$ are not both zero and we understand two projective points as equal if they define the same line: i.e. $$[z_1:z_2] = [w_1:w_2] \; \mathrm{if} \; z_1 = \lambda w_1, \; z_2 = \lambda w_2 \; \mathrm{for} \; \mathrm{some} \; \lambda \in \mathbb{C}.$$ Essentially, you get the usual complex numbers $z = [z:1]$ and a "point at infinity" $[1:0].$ In this interpretation, M\"obius transformations are just linear maps: $$\begin{pmatrix} a & b \\ c & d \end{pmatrix} [z:1] := [az+b : cz+d] = \Big[ \frac{az+b}{cz+d} : 1 \Big].$$