Expected number of connected components on a line

193 Views Asked by At

This problem is from a quant interview.

Consider a line of $n$ adjacent colorless squares. Color in each individual square black with probability 3/4 or white with probability 1/4, independent of all other squares. A connected component is a maximal sequence of adjacent squares all with the same color. For example, for $n=10$ BBWBWWWBBW has 6 connected components. Find the expected number of connected components in our line.

Here's my attempt.

Let $X_1,\ldots,X_{n+1}$ be iid random variables that model the colors of each square.
Let $S_n$ be the random variable that counts the number of connected components. I'm trying to find a recurrence relation on $E[S_n]$.

$\begin{align} E[S_{n+1}] &= E[S_{n+1} 1_{X_{n+1}= X_{n}}] + E[S_{n+1} 1_{X_{n+1}\neq X_{n}}] \\ &= E[S_{n} 1_{X_{n+1}= X_{n}}] + E[(1+S_{n}) 1_{X_{n+1}\neq X_{n}}] \\ &= E[E[S_{n} 1_{X_{n+1}= X_{n}}|X_1,\ldots,X_n]] + E[E[(1+S_{n}) 1_{X_{n+1}\neq X_{n}}|X_1,\ldots,X_n]] \\ &= E[S_{n}\big(\frac 34 1_{X_n=B} + \frac 14 1_{X_n=W}\big)] + E[(1+S_{n}) \big(1-\frac 34 1_{X_n=B} - \frac 14 1_{X_n=W}\big)]. \end{align}$

I cannot proceed further because $S_n$ and $X_n$ are not independent...

2

There are 2 best solutions below

0
On BEST ANSWER

Exploiting the hint given in the comment section, we indeed have $S_n=1+\sum_{i=1}^{n-1} 1_{X_{i+1}\neq X_i} $, thus $$E[S_n] = 1 + \sum_{i=1}^{n-1} P(X_{i+1}\neq X_i) = 1 + \sum_{i=1}^{n-1} [1-(3/4)^2-(1/4)^2] = 1+\frac{3(n-1)}8.$$

0
On

Let the probabilities of a black square and the white square be, respectively, $p$ and $q$ (where in this problem $p=\frac{3}{4}$ and $q=\frac{1}{4}$).

As @jschnei said in their comment, one extra component happens every time when $X_{i+1}\ne X_i$. The probability for that to happen is $2pq$ ($pq$ that it will be white followed by black, plus $qp$ that it will be black followed by white).

Let

$$Y_i:=\begin{cases}1&X_{i+1}\ne X_i\\0&X_{i+1}=X_i\end{cases}$$

We are really being asked to calculate $E(1+\sum_{i=1}^{n-1}Y_i)$, which is $1+\sum_{i=1}^{n-1}E(Y_i)=1+(n-1)\cdot 2pq$. In our case it is $1+\frac{3}{8}(n-1)$. The crucial insight here is that we have used linearity of expectation (to conclude $E(\sum_i Y_i)=\sum_i E(Y_i)$) which is valid even for random variables that are not necessarily independent.