average euclidean distance between vector of coin flips

447 Views Asked by At

I have a biased coin with the probability of flipping heads as $p$. I have a room of $n$ people and they each have this same coin. I ask everybody to flip their coins, and record their results as an $n$-dimensional vector of ones (heads) and zeros (tails) like [1 0 0 1 0 ...]

I do this many times, and I measure the average squared euclidean distance between any two n-dimensional vectors.

The result seems to be that on average, the squared euclidean distance between any of these two vectors is:

$2np(1-p)$

Why is this the case? This looks to me like twice the variance of a binomial distribution, but I have no idea why or how to derive that...


Context: I'm trying to understand a paper on a theoretical model of brain activity http://dx.doi.org/10.1016/j.neuron.2014.07.035 and I simplified the problem to coin flips for ease of explanation. :) I ran some simulations on the coin flip problem and it seems to check out.

2

There are 2 best solutions below

0
On BEST ANSWER

Let $X_1,\ldots,X_n,Y_1,\ldots,Y_n$ be i.i.d random variables with Bernoulli distribution with parameter $p$. Then \begin{align*} \mathbb{E} \left[\sum_{i=1}^n (X_i - Y_i)^2\right] &= \sum_{i=1}^n \mathbb{E} \left[ (X_i - Y_i)^2 \right] \\ &= \sum_{i=1}^n \mathbb{E} \left[ X_i^2 - 2X_iY_i + Y_i^2\right] \\ &= \sum_{i=1}^n \left( \mathbb{E} [ X_i^2 ] - \mathbb E [2X_iY_i] + \mathbb E [Y_i^2]\right) \\ &= \sum_{i=1}^n \left( \mathbb{E} [ X_i] - 2 \mathbb E [X_i] \mathbb E [Y_i] + \mathbb E [Y_i]\right) \\ &= \sum_{i=1}^n \left(p - 2 p^2 + p\right) \\ &= 2np(1-p), \end{align*} where we use the linearity of expectation and that $X_i^2 = X_i$ for Bernoulli distributions.

4
On

Let $(S_1,\dots, S_n)$ and $(T_1,\dots, T_n)$ be random vectors. For $i=1$ to $n$, define random variable $X_i$ by $X_i=1$ if $S_i=T_i$ and $0$ otherwise. Then our sum of squares is $X_1+\cdots+X_n$.

We want $E(X_1+\cdots+X_n)$, which by the linearity of expectation is $E(X_1)+\cdots+E(X_n)$.

We have $\Pr(X_i=1)=2p(1-p)$ (head, tail or tail, head). Thus $E(X_i)=2p(1-p)$ and our required expectation is $(n)(2p(1-p))$.