I'm going through the proof of Markov's Inequality, defined as
For a non-negative random variable $X$ with expectation $E(X)=\mu$, and any $\alpha > 0, P[X \geq \alpha] \leq \frac{E(X)}{\alpha}$
So, to understand what this was trying to say in the first place, I rephrased it as "the probability that non-negative r.v. $X$ takes on a value greater than $\alpha$ is upper bounded by $\frac{E(X)}{\alpha}$".
Now, my book goes into the proof in the following manner:
$E(X) = \sum_a a \times P[X=a]$ this line makes perfect sense
$E(X) \geq \sum_{a \geq \alpha} a \times P[X=a]$ why did they introduce an inequality here? The book also says that this is a crucial step where we have used the fact that $X$ takes on only non-negative values. I don't understand where that property is being used. I tried plugging in an r.v. that took negative values and didn't see anything out of the ordinary - why is it a constraint in this theorem?
$E(X) \geq \alpha \sum_{a \geq \alpha} P[X=a]$ how can you pull out $\alpha$ out of the summation?
$E(X) = \alpha P[X \geq \alpha]$ how did the inequality change places?
$$E(x)=\sum_{k}k \cdot Pr[X=k]$$ until here no problem. Note that the sum is done on the support of $X$, which means that in this case the $k$ are all the possible values of $X$, and the hypothesis of non negativity allows you to say that $k\geq0$. Thus you can remove any number of terms from the sum and the remaining sum will be smaller, because the terms you removed are positives. So we have $$E(x)=\sum_{k}k \cdot Pr[X=k]\geq \sum_{k\geq\alpha}k \cdot Pr[X=k]$$ Now, the $k$ are positives and greater than $\alpha$. If you substitute every $k$ with an $\alpha$, you will obtain a smaller sum
$$E(x)=\sum_{k}k \cdot Pr[X=k]\geq \sum_{k\geq\alpha}k \cdot Pr[X=k]\geq \sum_{k\geq\alpha}\alpha Pr[X=k]=\alpha\sum_{k\geq\alpha}Pr[X=k]=\alpha Pr[X\geq\alpha]$$
Where you can pull out the $\alpha$ because it is not an index of the summation.