The problem is regarding probability. The problem is given in detail here.
In an $m \times n$ grid, $2mn -m -n$ matchsticks are placed at the inner boundaries between squares on the grid.
We can play a chance experiment where each matchstick can be removed with a certain probability $p$. We can define the ratio between number of cells with area less than or equal to 3 and the total number of cells pre-experiment($m \times n$) as score. We need to find the expectation of this score for a given grid specified by $m$, $n$ and a given probability $p$.
It is sufficient if we calculate the expected value of the score within an absolute error of $10^{-9}$ (say $\epsilon$).
A working code for this has been posted here.
My reasoning behind the code is,
- For an $m$ by $n$ grid, there are $n_{ms} = 2mn -m -n$ matchsticks (provided in the problem). For each stick, there are two possibilities - it can either be removed or can be retained (with a probability of $p$ or $1-p$). Therefore, there are $2^{n_{ms}}$ possible states of the Grid post experiment.
- The expectation of the score is (from definition), $$E[score] = \sum x.Pr(score = x)$$ As I am unable to come up with an estimate of the probability of landing on a particular score. I have decided to go the other way around by searching through all the possibilities as, $$E[score] = \sum_{i=0}^{2^{n_{ms}}-1} score_i \times Pr(case_i)$$ Here every case is one of the possibilities listed in Step 1.
- To do this, I can generate numbers from $0$ till $2^{n_{ms}}-1$ in binary. If each digit in the binary number represents the presence/absence of a particular matchstick, then if I remove/retain matchsticks according to the binary string (I have a function to do this) I will have simulated the entire space of possibilities. For every case I can compute the score ($score_i$) with the corresponding Grid.
- For a particular case in step 3 (say, case $i$), there will be $n_r$ sticks that are removed and $n_{ms}-n_r$ sticks that are retained (basically the number of 1s and 0s in the binary representation of $i$). The probability of this particular case to occur is $p^{n_r}(1-p)^{n_{ms} - n_r}$ (which is nothing but $Pr(case_i)$). Now finally to compute the expected score, we just need to list the different scores and their corresponding probabilities to plug into the expression in Step 2.
- I run my code to find the score for a particular instance ($i$), I add this score ($score_i$) to a list. I find the probability of this particular instance occurring (say $Pr_i$) and add it to another list. If the $score_i$ is present in the list of scores, then I use the index to access the corresponding probability and add $Pr_i$ to it. Finally I get the expected value from the expression in step 2.
This works, but, very inefficiently. Even for trivial cases where $p$ is either 0 or 1, the algorithm has to go through the entire range of possibilities.
One way to improve it is to compute only the cases which are reasonably likely to occur (trading off the absolute error for speed). This is where I do not know if my assumptions are valid.
The probability of a given occurance $i$ is, $$ Pr_i = p^{n_{1s}}(1-p)^{n_{ms}-n_{1s}}$$ where $n_1s$ is the number of 1s in the binary representation of $i$.
The number of cases for a particular $n_{1s}$ is $n_{ms} \choose n_{1s},n_{ms}-n_{1s}$.
As, $$score_i \leq 1$$
We get, $$E[score|n_{1s}] \leq p^{n_{1s}}(1-p)^{n_{ms}-n_{1s}}\frac{n_{ms}!}{n_{1s}!\times(n_{ms}-n_{1s})!}$$
Can I exclude these cases as long as, $$E[score|n_{1s}] < \epsilon$$
Even more efficient: The following is for generic $m,n>3$; the cases where $m$ or $n$ are 3 or less need to be treated separately, but are easier.
The expected number of cells of shape $1\times 1$ is found by noting that the cell can be in the interior, on a side, or in a corner. When discussing expected values (as opposed to the distribution of the number of $1\times 1$ cells) we can treat each potential cell as independent of the others. In each case, $4$, $3$, or $2$ sides have to have specific outcomes. So the expected number of such cells is $$ e_{1\times 1}(m,n,p) = (m-2)(n-2) (1-p)^4 + (2(m-2)+2(n-2) (1-p)^3 + 4(1-p)^2 $$ The meaning of the second term, for example, is that a cell touching a boundary can do this in $m-2$ places along the top and bottom, or in $n-2$ places along the left and right sides, and for such a cell to occur, three matches needed to remain in place.
The expected number of cells of shape $1\times 2$ is found by noting that the cell can be in the interior, on a side along its short dimension, on a side along its long dimension, or in a corner. And its orientation can be long side vertical or long side horizontal. This time, there must the the required number of non-removed sticks, and the middle stick has to have been removed. $$ e_{1\times 2}(m,n,p) = ((m-3)(n-2) + (m-2)(n-3))p (1-p)^6 + (2(m-2)+2(n-2) p(1-p)^5 + (2(m-3)+2(n-3) p(1-p)^4 + 8p(1-p)^3 $$ The meaning of the second term, for example, is that a cell touching a boundary along its long side can do this in $m-3$ places along the top and bottom, or in $n-3$ places along the left and right sides, and for such a cell to occur, four matches needed to remain in place and the middle match needs to have been removed.
We can see what starts to happen if the grid is too small; cases where the cell spans from one border to the other would be miscounted in the above formula.
Moving on, the expected number of cells of shape $1\times 3$ is found by noting that the cell can be in the interior, on a side along its short dimension, on a side along its long dimension, or in a corner. And its orientation can be long side vertical or long side horizontal. This time, there must the the required number of non-removed sticks, and the two middle sticks have to have been removed. $$ e_{1\times 2}(m,n,p) = ((m-4)(n-2) + (m-2)(n-4))p^2 (1-p)^8 + (2(m-2)+2(n-2) p^2(1-p)^7 + (2(m-4)+2(n-4) p^2(1-p)^5 + 8p^2(1-p)^4 $$ The last case is for cells of a $3$-box $L$ shape. Here, there are four possible orientations of the cell instead of two, and the box can be influenced by the sides and corners by having a short edge along a side, a long edge along a side, two long edges in a corner, or two short edges in a corner. (The latter happens, numbering the grid boxes from $(1,1)$ to $(m,n)$ if the cell covers boxes $(1,2), (2,2), (2,1)$.) $$ e_{L}(m,n,p) = 4(m-3)(n-3)p^2 (1-p)^8 + (4(m-3)+4(n-3) p^2(1-p)^7 + (4(m-3)+4(n-3) p^2(1-p)^6 + 4p^2(1-p)^6 + 4p^2(1-p)^4 $$ The answer you want is $$ e_{\leq 3}(m,n,p) = e_{1\times 1}(m,n,p) + e_{1\times 2}(m,n,p) +e_{1\times 3}(m,n,p) +e_{L}(m,n,p) $$ This answer is exact and can be provided to double precision accuracy in microseconds. For example, $e_{\leq 3}(10,7,\frac12) =\frac{4957}{512}$ so the score in that case would be $\frac{4957}{35840}\approx 0.138$. Here you are affected by boundaries pretty noticeably; the score for a very large grid and $p=\frac12$ is about $0.084$.
BTW, you can use these functions to prove that if you want to count only cells of size exactly $3$, in the limit of very large grids, the value of $p$ that produces the highest score is $\frac15$. At that point, you get a 3-cell score of about $0.0403$.