I was trying to solve this question. The following is a modified shorter question:
- You have $n$ blocks (numbered 1 through n) with positive integral weights $w_1, w_2, \ldots, w_n$. You are also given integers $a_1$ through $a_n$, where $a_i$ is 1 if you like the i-th block and 0 otherwise.
- You will perform $m$ operations. In each operation:
- the probability that you pick the i-th block is proportional to its weight, i.e., $p_i = \frac{w_i}{\sum_i{w_i}}$
- you will pick only one block, and increment its weight by 1 if you like that block, or by -1 if you do not.
- Find the expected weight of each block after all the operations are finished.
My approach: repeat the following steps, $m$ times:
- Compute $S = \sum\limits_{i=1}^nw_i$.
- For each $i$, update $w_i$ to the following value: $w_i = \begin{cases} \displaystyle w_i + \frac{w_i}{S},& \text{if } a_i = 1 \\ \displaystyle w_i - \frac{w_i}{S},& \text{otherwise}\end{cases}$
Basically, this is updating the expected value of all the blocks, based on their expectation values at the previous step.
To me this method feels very intuitive, and I do not have a mathematically rigorous way of proving why this is wrong (I know it's wrong because it failed tests). Could anyone help prove concretely why this is wrong?
Moreover, this approach gives the right answer if all $a_i$ are 1 (that is, the weights always gets incremented) What difference does all $a_i$ being one cause, to produce right answers?
The expected value depends on each of the states, and not just the summarized expected value obtained at the previous step.
As an example, consider starting with $w _ 1 = w_2 = 1, a_ 1 = 1, a _ 2 = 0 $, and taking the first step.
Half of the time, you get $ (w_1, w_2 ) = (2,1)$ and the other half of the time you get $(1,0)$. The expected value then becomes $ (\frac{3}{2}, \frac{1}{2}) $.
Clearly, at the next step, the adjustments of weights isn't equal across all cases. We actually end up with weighted expected value of $\frac{2}{6} (3,1) + \frac{1}{6} (2,0) + \frac{1}{2} (2,0) = ( \frac{7}{3}, \frac{1}{3})$, whereas OP's approach yields $(\frac{3}{2} + \frac{3}{4}, \frac{1}{2} - \frac{1}{4} ) (\frac{9}{4}, \frac{1}{4})$.