Picking the better outcomes from sets of Bernoulli trials

37 Views Asked by At

Consider $k \in \mathbb{N}$ Bernoulli trials, with each independently having a success chance of $0<p<1$ and failure chance of $1-p$. The expected number of successes should be $k*p$ for $k$ trials.

Now I perform these $k$ trials $n \in \mathbb{N}$ times, with $n$ being even. Again, the total expected number of successes should be $n*k*p$.

However, what happens when I pick the $n/2$ most successful sets of $k$ trials, with a set of $k$ trials being more successful than another when it has a higher number of successes.

The expected number of successes should be at least $n/2*k*p$, but I have the feeling a better bound can be achieved? Sadly, I do not have a good starting point on how to achieve this...Thank you for your help :-)

1

There are 1 best solutions below

1
On

Your 'feeling' seems to be correct: Suppose $n = 8,\,k = 10,\,p = 1/3,$ so that without the advantage of picking the best four, you should average a score of $(n/2)*k*p = 26.667.$ Let's try it in R statistical software a few times and see what happens:

n = 8;  k = 20;  p = 1/3      # constants
x = sort(rbinom(n, k, p)); x  # n random samples from BINO(k,p)
##  5  5  5  6  8  9 10 10    # list them
s = sum(x[(n/2):n]);  s       # sum the 'best half'
## 43                         # better than 26.667

Three more runs gave $S$ values 46, 38, and 37.

Over a million runs I get $E(S) \approx 39.05$ and $SD(S) \approx 4.25.$ So your idea certainly works out in practice for the particular parameters I chose. I will leave it to you to formulate a 'bound' or mean and find a general analytic solution.

m = 10^6;  s = numeric(m)
n = 8;  k = 20;  p = 1/3
for (i in 1:m) {
  x = sort(rbinom(n, k, p))
  s[i] = sum(x[(n/2):n])  }
mean(s);  sd(s)
## 39.04539
## 4.246463

Below is a histogram of the simulated distribution of $S$ along with the PDF of $Norm(\mu = 39.05, \sigma=4.25),$ which, on account of the skewness of $S,$ is not an excellent fit.

enter image description here