I was reading a proof in Mathematical Statistics by Rice establishing that the correlation coefficient, $\rho$, is such that $-1 \leq \rho \leq 1$. Here is the proof:
I have a few questions:
What was the reasoning behind taking the sum of the random variables besides "it working out" ? i.e: why $\frac{X}{\sigma_{X}} + \frac{Y}{\sigma_{Y}}$?
Why the choice to normalize the random variables ? i.e: why $\frac{X - 0}{\sigma_{X}}$
Why are we allowed to assume that $\mu_{X} = 0$ ? same for $Y$

Alternative proof (it's just a morning brainstorming...):
Suppose that $Y=a+bX$ is just an estimation of Y, so not with Proability 1.
As known, this is a linear interpolation Y based on X.
The estimation of the slope is $\hat{b}=\rho\cdot \frac{\sigma_{Y}}{\sigma_{X}}$
Now let's calculate
$\rho=\frac{cov(X,Y)}{\sigma_{X}\sigma_{Y}}=\frac{\mathbb{E}[(a+bX)X]-\mathbb{E}[X]\mathbb{E}[a+bX]}{\sigma_{X}\sigma_{Y}}=\frac{a\mathbb{E}[X]+b\mathbb{E}[X^2]-a\mathbb{E}[X]-b\mathbb{E}^2[X]}{\sigma_{X}\sigma_{Y}}=\frac{b\mathbb{V}[X]}{\sigma_{X}\sigma_{Y}}=b\frac{\sigma_{X}}{\sigma_{Y}}$
Now it is self evident that if
$$\mathbb{P}[Y=a+bX]=1$$
(it's almost sure that $Y=a+bX$) we can substitute $b$ with $\hat{b}$ with no errors and this prove the statement that $\rho=1$
Similar argument holds if the line is decreasing and $\rho=-1$