Problem:
let $m_1, m_2 \in \mathbb{Z}_+$ and $m = m_1 + m_2$ and let $y_1, ... , y_m$ be m real numbers.
Define:
$\mu = \frac1m\sum_{i=1}^m y_i,\ \ \mu_1 = \frac{1}{m_1}\sum_{i=1}^{m_1} y_i, \ \ \mu_2 = \frac{1}{m_2}\sum_{i=1+m_1}^{m} y_i$
Show that:
$\sum_{i=1}^m(y_1-\mu)^2 = \sum_{i=1}^{m_1}(y_i - \mu_1) + \sum_{j=1+m_1}^m(y_j - \mu_2)^2 + \frac{m_1m_2}{m}(\mu_1 -\mu_2)^2$
$\mu_1$ and $\mu_2$ are the means of a node after a split in a decision tree.
Solution:
$\sum_{i=1}^m(y_i - \mu)^2 = \sum_{i=1}^my_i^2 - 2 y_i\mu + \mu^2 \\
= \sum_{i=1}^{m}y_i^2 - 2 y_i\left(\frac{m_1\mu_1 + m_2 \mu_2}{m_1+m_2}\right) + \left(\frac{m_1\mu_1 + m_2 \mu_2}{m_1+m_2}\right)^2 \\
= \sum_{i=1}^{m_1}\left(y_i^2 - 2y_i\frac{m_1\mu_1}{m1+m2} + \left(\frac{m_1\mu_1}{m_1+m_2}\right)^2 -2y_i\frac{m_2\mu_2}{m1+m2} +\left(\frac{m_2\mu_2}{m_1+m_2}\right)^2 + 2\left(\frac{m_1\mu_1 + m_2\mu_2}{m_1+m_2}\right)\right) \\
+\ \sum_{j=1+m_1}^{m}\left(y_j^2 - 2y_j\frac{m_2\mu_2}{m1+m2} + \left(\frac{m_2\mu_2}{m_1+m_2}\right)^2 -2y_j\frac{m_1\mu_1}{m1+m2} +\left(\frac{m_1\mu_1}{m_1+m_2}\right)^2 + 2\left(\frac{m_1\mu_1 + m_2\mu_2}{m_1+m_2}\right)\right)\\
=\sum_{i=1}^{m_1}(y_i - \mu_1\frac{m_1}{m})^2 + \sum_{j=1+m_1}^{m}(y_i - \mu_2\frac{m_2}{m})^2 -\sum_{k=1}^m \left(2\frac{y_k^2}{m} - \frac{y_k^2}{m^2} +2\left(\frac{m_1\mu_1 + m_2\mu_2}{m_1+m_2}\right) \right)$
I feel like I'm on the right track but am missing something obvious that would clean it up