I'm having trouble understanding the definition of variance of a Bernoulli distribution. I thought that the variance was the sum of the squared absolute values of each data point's distance from the mean divided by the number of distributions. However, I see a different definition of the variance for Bernoulli distributions:
In general, it is useful to think about a Bernoulli random variable as a random process with only two outcomes: a success or failure. Then we build our mathematical framework using the numerical labels 1 and 0 for successes and failures, respectively. If p is the true probability of a success, then the mean of a Bernoulli random variable X is given by:
$$ \mu = E[X] = P(X = 0) * 0 + P(X = 1) * 1 $$ $$ = (1 p) ⇥ 0 + p ⇥ 1 = 0 + p = p $$
Similarly, the variance of X can be computed:
$$ \sigma ^2 = P(X = 0)(0 - p)^2 + P(X = 1)(1 - p)^2 $$
$$ p = (1 - p)p^2 + p(1 - p)^2 $$
$$ = p(1 - p) \ $$
What is going on above? That doesn't seem like the standard definition of variance. Why are we taking the probability of $X = 0$ and multiplying it by the squared difference of p from 0?
I suspect you are confusing sample variance with variance. Taking the average squared deviation is an estimator of $\text{Var}(X)$, not $\text{Var}(X)$ itself.
Variance is defined as $$ \mathbb E[(X-\mu)^2] $$ where $\mu = \mathbb E [X]$. In the case of Bernoulli $\mu = p$, so you immediately get $$ \mathbb E[(X-p)^2] = \mathbb P(X = 0)(0-p)^2 + \mathbb P(X = 1)(1-p)^2 $$