Carathéodory's theorem states that
Assume $x \in \mathbb R^d$ and $x \in \text{CH}(P)$, then $X$ can be written as a convex combination of at most $d+1$ points in $P$.
The proof is given by:
Let $x$ be a point in the convex hull of $P$, then $x$ is a convex combination of finite points in $P$: $$x=\sum_{j=1}^{k}\lambda_j x_j$$
Where every $x_j$ is in $P$,every $λ_j$ is (w.l.o.g.) positive, and ${\displaystyle \sum _{j=1}^{k}\lambda _{j}=1}$ Suppose $k > d + 1$ (otherwise, there is nothing to prove).
I think if $k \le d+1$, then $x$ is already written as a convex combination of at most $d+1$ points in $P$ and that's why if the case happens, then there is nothing to prove
Then, the vectors $x_2 − x_1, ..., x_k − x_1$ are linearly dependent,
I think the linear independence of these vectors is because they belong to $\mathbb R^d$, so if we choose $m$ of them with $m > d$, then they cannot be linearly independent
So there are real scalars $μ_2, ..., μ_k$, not all zero, such that
$${\displaystyle \sum _{j=2}^{k}\mu _{j}(\mathbf {x} _{j}-\mathbf {x} _{1})=\mathbf {0} .}$$
If $μ_1$ is defined as
$${\displaystyle \mu _{1}:=-\sum _{j=2}^{k}\mu _{j}}$$ then
$${\displaystyle \sum _{j=1}^{k}\mu _{j}\mathbf {x} _{j}=\mathbf {0} }\;;\,\;\; {\displaystyle \sum _{j=1}^{k}\mu _{j}=0}$$ and not all of the $μ_j$ are equal to zero. Therefore, at least one $μ_j > 0$ Then, $$\mathbf{x} = \sum_{j=1}^k \lambda_j \mathbf{x}_j-\alpha\sum_{j=1}^k \mu_j \mathbf{x}_j = \sum_{j=1}^k (\lambda_j-\alpha\mu_j) \mathbf{x}_j$$ for any real $α$. In particular, the equality will hold if $α$ is defined as
$${\displaystyle \alpha :=\min _{1\leq j\leq k}\left\{{\tfrac {\lambda _{j}}{\mu _{j}}}:\mu _{j}>0\right\}={\tfrac {\lambda _{i}}{\mu _{i}}}.} $$
This is where I don't understand, all we know is that some of $\mu_j$'s are not zero, so some of them can be zero or negative, so based on what condition we define such $\alpha$?
Since all $\lambda_i$s are nonnegative, the $\alpha$ defined in the proof is always nonnegative. Therefore, when $\mu_j$ is zero or negative, we will have $\lambda_j-\alpha\mu_j\ge\lambda_j\ge0$, i.e., the new $j$-th coefficient remains nonnegative.
When $\mu_j$ is positive, $\lambda_j-\alpha\mu_j$ will become smaller than $\lambda_j$. However, it is still nonnegative because $\alpha$ is defined as the minimiser among all ratios $\frac{\lambda_i}{\mu_i}$ over positive $\mu_i$s.
Therefore the new coefficients will all be nonnegative, regardless of the sign of each $\mu_j$. Since the coefficients sum up to $1$, the new linear combination is a convex one.