If we have Sturm's sequence of polynomials, $p_0=p, p_1, ..., p_m$, for a given polynomial $p$, the number of real roots of $p$ in some half-closed interval $(a,b]$ is $W(a)-W(b)$, where $W$ is function that takes real number $x$ and gives us the number of sign changes in Sturm's sequence, evaluated at $x$.
In order to prove the theorem, we watch what happens to $W$ as $x$ moves from left to right on $x-$axis. We pick some interval $(a-\epsilon, a+ \epsilon)$ where none of the polynomials in Sturm's sequence is zero, except maybe at $a$. We separate the cases when $a$ is zero of some $p_i, i>0$ and when $p_0(a)=0$.
In the first case, we prove that there is the same number of sign changes in triple $(p_{i-1}, p_i, p_{i+1})$ as $x$ moves across $a$. What I don't understand is how this proves that the number of sign changes stays the same not just when we count it for triple $(p_{i-1}, p_i, p_{i+1})$, but for the whole sequence. What if there is some $p_j, i \neq j$, that is zero at $a$ ? We can apply the proven for triple surrounding $p_j$, but it's not obvious to me that if there is no change of $W$, evaluated at polynomials in groups of three, then there is no change of $W$ evaluated at every polynomial in the sequence.
Also, in the second case, we prove that the number of sign changes drops by $1$ when we cross the zero of $p_0$, but we only prove it for pair $(p_0, p_1)$. How to move from here to counting the number of sign changes in the whole sequence ?
I've searched online and all the proofs are same. They skip the part that I'm talking about (or maybe it's obvious and I don't see it).
Let $\sigma(u,v) = 1$ if $u < 0 < v$ or $v < 0 < u$, and $\sigma(u,v) = 0$ otherwise, then for a sequence $\{y_i\}_{i=0}^N$ of non-zero values, $$\sum_{i=1}^N \sigma(y_{i-1}, y_i)$$ gives the number of sign variations in the sequence $\{y_i\}$. Unfortunately, it doesn't work when some of the $y_i$ are zero, but I'll work around that.
In particular, if $P = \{p_i\}_{i=0}^N$ is any sequence of non-zero polynomials, then $$W_P(x) = \sum_{i=1}^N \sigma(p_{i-1}(x), p_i(x))$$ except where one or more $p_i(x) = 0$. But that is only at a finite number of isolated points. If on some interval, $p_{i-1}, p_i$ are never $0$, then neither changes sign, so $\sigma(p_{i-1}, p_i)$ is constant on the interval. If $p_i(c) = 0$ for some $c$, there is an open interval about $c$ such that $p_{i-1}, p_i, p_{i+1}$ are not zero everywhere other than at $c$. So to the left of $c$, and to the right of $c$ the number of sign variations in this triple is $$T_i(x) := \sigma(p_{i-1}(x), p_i(x)) + \sigma(p_i(x), p_{i+1}(x))$$ At $c$ itself, the number of sign variations will be $$T_i(c) := \sigma(p_{i-1}(c), p_{i_1}(c))$$ since $p_i(c) = 0$. The basic result is
Lemma: if $P = \{p_i\}_{i=0}^N$ is a sequence of non-zero polynomials satisfying the conditions
Then $W_P(x)$ is constant on all intervals that do not include a zero of $p_0$ or $p_N$.
Proof: As noted above the $\sigma(p_{i-1}, p_i)$ are all constant on intervals that do not contain zeros of any of the polynomials, so their sum $W_P(x)$ will also be constant on those intervals. The only place where it can change value is at points $c$ where at least one of the polynomials equals $0$. Since there are only a finite number of such roots, they are isolated from each other.
Let $(a,b)$ be an interval not including any zeros of the two end polynomials and only one zero $c$ of the remaining polynomials. Near $c$ we can divide the indices into two sets $A = \{i\mid p_{i-1}(c) = 0\text{ or } p_i(c) = 0\}, B = \{1, \ldots N\} \setminus A$. Then $$W_P(x) = \sum_{i\in A} \sigma(p_{i-1}(x), p_i(x)) + \sum_{i\in B} \sigma(p_{i-1}(x), p_i(x))$$ Since neither polynomial in the sum ober $B$ is $0$ near $c$, every term, and therefore the sum, is constant. Since no polynomials that are $0$ at $c$ are adjacent, we can rewrite the sum over $A$ as
$$\sum_{i\in A} \sigma(p_{i-1}(x), p_i(x)) = \sum_{p_i(c) = 0} \sigma(p_{i-1}(x), p_i(x)) + \sigma(p_i(x), p_{i+1}(x)) = \sum_{p_i(c) = 0} T_i(x)$$ But by the hypothesis, $T_i(x)$ is constant near $c$ for each $i$ with $p_i(c) = 0$. So the sum over $A$, and therefore $W_P$, are both constant near $c$. Since $W_P$ is constant between any two zeros of the inner polynomials, and also in neighborhoods of those zeros, it must be constant over the entire interval $(a,b)$. QED
The only reason the argument doesn't work for zeros of $p_0$ and $p_N$ is that there is no polynomial on one side to form one of the triples.
Now, given a square-free polynomial $p_0$, the Sturm sequence satisfies the recursion $$p_{i-1} + p_{i+1} = q_ip_i$$ for some polynomials $q_i$. If $p_i(c) = 0$ and either of $p_{i-1}(c)$ or $p_{i+1}(c)$ is also zero, then the third polynomial is zero as well. Thus if the Sturm sequence has two adjacent polynomials that are $0$ at $c$, then every polynomial in the sequence must also be equal to $0$ at $c$. This includes $p_0$ itself and $p_1 = p_0'$. But a square-free polynomial and its derivative cannot share a zero. So this cannot occur. Therefore no two adjacent polynomials in the Sturm sequence share a common zero.
Now if $p_i(c) = 0$, we must have $p_{i+1}(c) \ne 0, p_{i-1}(c) \ne 0$, but $p_{i-1}(c) + p_{i+1}(c) = q_i(c)p_i(c) = 0$. Therefore $p_{i+1}(c) = -p_{i-1}(c) \ne 0$, so there must be some neighborhood of $c$ in which $p_{i+1}$ and $p_{i-1}$ are of opposite signs. In this neighborhood except at $c$ itself, $p_i$ must agree in sign with one or the other. Thus there is exactly $1$ sign variation between the three polynomials anywhere in the neighborhood. I.e. $T_i$ is constant on the neighborhood.
So Sturm sequences satisfy both conditions of the lemma, and $W_P$ is constant between zeros of $p_0$ and $p_N$. The final piece of the puzzle is this: For a square-free Sturm sequence, $p_N$ is never $0$. Since $p_N$ is the last polynomial in the sequence, the next remainder must be $0$. But that meant $p_{N-1} = q_Np_N$. So if $p_N(c) = 0$, then so also $p_{N-1}(c) = q_N(c)p_N(c) = 0$. And as indicated above, this means that all polynomials in the sequence are $0$ at $c$, contradicting that $p_0$ is square-free.
Since $p_N$ has no zeros, $W_P$ can only change values at the zeros of $p_0$.
The argument breaks down when $p_0$ is not square-free. I have not worked out how the proof must change in this case. Wikipedia indicates that the result in the non-square-free case is only a little more restrictive.