Best subset with exactly one success

102 Views Asked by At

This is an interview question that I was asked, but I totally couldn't figure it out:

Given N items = {a,b,c,d,e...}, each with a probability {$P_a$,$P_b$,$P_c$,$P_d$,$P_e$...} of succeeding.

Given that you can select any subset of items (i.e {b,c,e}), what is the highest probability you could get such that exactly ONE of the item in this subset succeed.

Say you select {b,c,e} the probability is $P_b$(1-$P_c$)(1-$P_e$) + Pc*(1-$P_b$)(1-$P_e$) + $P_e$(1-$P_b$)*(1-$P_c$)

If there exist any item x with probability 1, it is best to select only that item.

How can we find out the maximum possible probability given that you can choose any subset?

1

There are 1 best solutions below

2
On

The best strategy is, indeed, to construct a subset by starting with the most probable event, and adding events in decreasing order of probability, until a particular sum attains $1$.

For any event $e$, let $p_e$ be the probability of success, and $q_e = 1-p_e$ the probability of failure. As a convenient abuse of notation, for any subset of events $E$, let $p_E$ be the probability of exactly one success amongst the events in $E$, and let $q_E \not= 1-p_E$ be the probability of exactly no successes amongst the events in $E$. We then have

$$ p_E = \left( \prod_{e \in E} q_e \right) \left(\sum_{e \in E} \frac{p_e}{q_e} \right) $$

Then $E$ is maximal, in the sense that it is not advantageous to add another event to $E$, if for any event $e' \not\in E$, we have

$$ p_{E'} \leq p_E $$

where $E' = E \cup \{e'\}$. We find

\begin{align} p_{E'} & = p_E q_{e'} + q_E p_{e'} \\ & = p_E - p_E p_{e'} + q_E p_{e'} \end{align}

so therefore $p_{E'} \leq p_E$ whenever

$$ p_E \geq q_E $$

Note that this condition does not depend on the event $e'$. Since

$$ q_E = \prod_{e \in E} q_e $$

the condition is equivalent to

$$ \sum_{e \in E} \frac{p_e}{q_e} \geq 1 $$

I'll continue on with this approach momentarily.


OK, I've had dinner. Still digesting, so maybe this is hasty, but I think one can justify the greedy approach (taking the largest probability events first) by observing that

$$ \frac{\partial p_E}{\partial p_e} = \frac{1}{(1-p_e)^2} \prod_{e' \in E \not= e} q_{e'} \geq 0 $$

so as long as you keep the number of events the same, you always get a better result by swapping a lower-probability event for a higher-probability event. However, you may get to a point where you're better off deleting an event—and that is precisely when deleting the lowest-probability event leaves the above sum still greater than or equal to $1$.