Source: Linear Algebra by Friedberg et al. (4 edn 2002). p. 261.
This is germane to Linear Algebra by Lay (4 edn 2011). p. 270. Section 5.1. Theorem 2.
$\bbox[,10px,border:5px solid gray]{\text{Theorem 5.5}} \;$ Let $T$ be a linear operator on a vector space V, and let $\lambda_{1},\ \lambda_{2},\ \ldots,\ \lambda_{k}$ be distinct eigenvalues of T. If $v_{1},\ v_{2},\ \ldots,\ v_{k}$ are eigenvectors of $T$ such that $\lambda_{i}$ corresponds to $v_{i} \forall 1\leq i\leq k$ , then $\{v_{1},\ v_{2},\ \ldots,v_{k}\}$ is linearly independent.
$\bbox[,10px,border:5px solid gray]{\text{Prove by mathematical induction on } k.} \;$ Suppose that $k=1$. Then $v_{1}\neq 0$ since $v_{1}$ is an eigenvector, and hence $\{v_{1}\}$ is linearly independent.
The Induction Hypothesis: Assume that the theorem holds for $k-1$ distinct eigenvalues, where $k-1\geq 1$.
We wish to show that $\{v_{1},\ v_{2},\ \ldots,\ v_{k}\}$ is linearly independent. Suppose that $a_{1},\ a_{2},\ \ldots$ , $a_{k}$ are scalars such that $a_{1}v_{1}+a_{2}v_{2}+\cdots+a_{k}v_{k} = \mathbf{ 0 }. \tag{1}$ Apply $\color{red}{T} \color{forestgreen}{-\lambda_{k}I}$ to $(1)$:
\begin{align} \color{red}{a_{1}T(v_{1}) + ... + a_{k - 1} T(v_{k - 1}) + a_{k}T(v_{k})} & = \mathbf{ 0 }. \\ \color{forestgreen}{- a_{1} \lambda_k v_{1 } - ... - a_{k - 1} \lambda_k v_{k - 1} - a_{k} \lambda_k v_{k}} & \end{align}
Then substitute $T(v_i) = \lambda_i v_i$ into the red sum:
$ a_{1}(\lambda_{1}-\lambda_{k})v_{1}+a_{2}(\lambda_{2}-\lambda_{k})v_{2}+\cdots+a_{k-1}(\lambda_{k-1}-\lambda_{k})v_{k-1}= \mathbf{ 0 }.$
I omit the irrelevant rest of the proof. This proof strategem is also used here.
$1.$ How can you divine this trick of applying $\color{orangered}{T} \color{forestgreen}{-\lambda_{k}I}$ ? What's its intuition?
$2.$ Why not multiply by $\color{orangered}{T} \color{forestgreen}{-\lambda_{k}}$? Why's $I$ needed in $\color{forestgreen}{-\lambda_{k}I}$, when $I(v_i) = v_i$ ?
Well, you want to use that $v_{k}$ is an eigenvector: that is $(T-\lambda_{k}I)v_{k}=0$. Furthermore, you know by the induction hypothesis that $\{v_{1},...v_{k-1}\}$ are linearly independent. Applying that linear map to both sides will eliminate $v_{k}$ and give a linear relation among the $v_{i}$, $i<k$.
It doesn't make sense to form the difference $T-\lambda_{k}$. $\lambda_{k}$ is a scalar, while $\lambda_{k}I$ is a diagonal matrix.