I was reading the proof of Kruskal Algorithm Reference.
What I am not understanding is why we assume that Kruskal always chooses the smallest edges? Maybe it did not choose one of the smallest edges because it could form a cycle? So in the proof here (and actually in many other ones)Reference, at the end of the proof, they conclude that $w(e_i) = w(e_0)$ since Kruskal chose $e_i$ which implies $w(e_i) \leq w(e_0)$, but maybe Kruskal didn't choose $e_0$ because it could form a cycle even though $w(e_0) \leq w(e_i)$?
If $w(e_0)< w(e_i)$ and $e_0$ was not chosen then it means that it created a cycle with a subtree of $T$. This means that you will find an edge in $T$ that is not in $T_0$ and with strict smaller weight than $e_i$ which contradict the definition of $e_i$.