Why is $\sum x(1-x)$ equal to $1-\sum x^2$?

81 Views Asked by At

I'm going through Python Machine Learning and I'm at the Gini impurity sections, where they define Gini Impurities as

$I_g(t) = \sum_{i=1}^c p(i|t) (1 - p(i|t))$

where p is the proportion of samples that belong to a class c for a particular node t. Fine, seems reasonable enough. But then they go on to simplify the formula into this:

$I_g(t) = 1 - \sum_{i=1}^c p(i|t)^2$

And I cannot, for the life of me, figure out how they arrived at this example. Am I making some incorrect assumptions as to how p(i|t) works? Can I not tokenize p(i|t) like any general variable?

1

There are 1 best solutions below

1
On BEST ANSWER

Note that $\sum_{i=1}^C P(i|t)=1$, that is how the $1$ is obtained in the simplification.

\begin{align}I_g(t) &= \sum_{i=1}^c p(i|t) (1 - p(i|t)) \\&= \sum_{i=1}^c (p(i|t) - p(i|t)^2) \\&= \sum_{i=1}^c p(i|t) - \sum_{i=1}^cp(i|t)^2\\&=1- \sum_{i=1}^cp(i|t)^2 \end{align}