Let $x_1\leq x_2\leq...\leq x_n$ be a sequence of real numbers in $\mathbb{R}$. If we want to group the numbers into $k$ groups such that the sum of the within group squared error is minimized, it seems intuitive that the optimal clustering mechanism should be in the form: $$\{x_1,...,x_{j_1}\}, \{x_{j_1+1},...,x_{j_2}\},...,\{x_{j_{k-1}+1},...,x_{n}\}$$ But how can we prove this mathematically with rigor? I've been thinking about this for a while but I believe it is not as easy as it seems to be.
By the sum of within group squared error, I mean, if $G_1,...,G_k$ is a partition of $\{1,...,n\}$, the sume of within group squared error is: $$\sum_{j=1}^k\sum_{i\in G_j}(x_i - \bar{x}_{G_j})^2$$ where $\bar{x}_{G_j}$ is the average of numbers in group $G_j$.

Show that if a solution is not of this form, it can be improved. That is, prove the contrapositive of "if optimal, then this form."
Edit: I agree that this approach is not so simple to flesh out. This link mentions that the following paper provides a proof: Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53, 789-798. Indeed, the Appendix contains a proof that uses the same approach that I sketched: assume there is $x_r < x_s < x_t$ with $x_r$ and $x_t$ in one group and $x_s$ in a different group, and show that any such grouping can be strictly improved.