Probability of choosing a subset of elements where each element has a different probability

831 Views Asked by At

I am trying to write a C++ program to do this but nobody on Stackoverflow can seem to help me so I thought I'd try to do it myself with some help from you guys.

My post on Stackoverflow can be found here: https://stackoverflow.com/questions/22727196/calculating-a-conditional-probability-that-is-similar-to-a-binomial-sum

My exact question would be, if I had 6 people where the probabilities of each person saying yes is given respectively by $(\frac{1}{2},\frac{1}{3},\frac{1}{4},\frac{1}{5},\frac{1}{6},\frac{1}{7})$ respectively, what is the closed form summation form of the probability that 2 of them say yes?

I can calculate the probability of this directly by hand but I am having difficulty finding a closed form expression.

4

There are 4 best solutions below

0
On BEST ANSWER

Let us first generalize, so we don't have to deal with actual numbers. I assume the numbers in your question are just examples, and not the actual numbers you want to use.

Let us say you have $n$ persons. Let the probability that person $i$ says yes be $p_i$. E.g., the probability of person 1 saying yes is $p_1$. Of course, it must follow that the probability of person $i$ saying no is $1 - p_i$. We want to know the probability of exactly $k$ people saying yes.

We can describe all scenarios by a sequence of zeros and ones, where the $i$:th digit is zero if person $i$ says no and one if person $i$ says yes.

E.g., if we have three persons and person 1 says yes, person 2 says no and person 3 says yes, this can be described as the configuration $(1,0,1)$ or ("yes", "no", "yes"). The probability of the configuration $(1,0,1)$ happening in this case is $p_1(1-p_2)p_3$.

Thus, in general, if we are given a configuration $\bar x = (x_1,x_2,\dots,x_n)$, where as before $x_i$ is either 0 or 1, we can easily compute the probability of this configuration happening. Let the probability of a configuration $\bar x$ be $P(\bar x)$.

Now, mathematically we can define the function $l(\bar x) = \sum_{i=1}^n x_i$ to be the number of ones in the configuration $\bar x$ and give the solution: $$p(\textrm{exactly $n$ people say yes}) = \sum_{\bar x: l(\bar x) = k} P(\bar x)$$ i.e., we sum the probabilities of the configurations where we have exactly $k$ ones.

Programmatically, we can use a divide-and-conquer approach. Let us say we want to know the probability of exactly $k$ people saying yes. The first person can say either yes or no (unless $p_1 = 1$). The same for the second person, and so on. Also, if $n-k$ persons have said no, the only way $k$ persons will say yes is if the remaining persons say yes, so in this case we can just compute $p_{n-k}p_{n-k+1}\cdots p_n$.

This can be expressed in code (general function but with your example as example):

#include <iostream>

double probability(double*,unsigned int,unsigned int);

int main()
{
    double yesprobs[6] = {1.0/2.0,1.0/3.0,1.0/4.0,1.0/5.0,1.0/6.0,1.0/7.0};
    double p = probability(yesprobs,6,2);
    std::cout << "The probability of exactly 2 people saying yes is: " << p << std::endl;

    return 0;
}

double probability(double* yesprobabilities, unsigned int numberOfPeople, unsigned int yesNumber)
{
    double kprobability = 0;
    // Not enough people!
    if(numberOfPeople < yesNumber)
    {
        kprobability = 0;
    }
    // n == k, the only way k people will say yes is if all the remaining people say yes.
    else if(numberOfPeople == yesNumber)
    {
        kprobability = 1;
        for(int i = 0; i < numberOfPeople; ++i)
        {
            kprobability = kprobability * yesprobabilities[i];
        }
    }
    else if(yesprobabilities[0] == 1)
    {
        kprobability += probability(&yesprobabilities[1],numberOfPeople-1,yesNumber-1);
    }
    else
    {
        // The first person says yes, k - 1 of the other persons have to say yes.
        kprobability += yesprobabilities[0] * probability(&yesprobabilities[1],numberOfPeople-1,yesNumber-1);
        // The first person says no, k of the other persons have to say yes.
        kprobability += (1 - yesprobabilities[0]) * probability(&yesprobabilities[1],numberOfPeople-1,yesNumber);
    }
    return kprobability;
}

Special case: $p_1 = p_2 = \dots = p_n$

In the special case where all the probabilities are the same $p_1 = p_2 = \dots = p_n$ (and independent of each other) we can call the probability that a person answers yes just $p$.

In this case we have a sequence of Bernoulli trials, which are well-studied in probability theory. The probability of exactly $k$ successes (people answering yes) is: $$\binom{n}{k}p^k(1-p)^{n-k}$$ where $\binom{n}{k}$ is a binomial coefficient.

1
On

$$\begin{align}p=&\sum_{2\le i<j\le7}\frac1i\frac1j\prod_{2\le k\le 7\atop k\notin\{i,j\}}\left(1-\frac1k\right)\\&=\prod_{k=2}^7\left(1-\frac1k\right)\cdot \sum _{i=2}^6\frac{\frac1i}{1-\frac1i}\sum_{j=i+1}^7\frac{\frac1j}{1-\frac1j}\\ &=\prod_{k=2}^7\left(1-\frac1k\right)\cdot \sum _{i=2}^6\frac{1}{i-1}\sum_{j=i+1}^7\frac{1}{j-1}\end{align}$$

2
On

Assuming the answers are independent, you are looking for the coefficient of $x^2$ in the expansion of $$(x+1)(x+2)(x+3)(x+4)(x+5)(x+6)/7!$$

which expands to $$({x}^{6}+21\,{x}^{5}+175\,{x}^{4}+735\,{x}^{3}+1624\,{x}^{2}+1764\,x+720)/5040$$

so giving a probability of two yeses of $\frac{1624}{5040}=\frac{29}{90}\approx 0.3222$ and you could find the probabilities of other numbers of yeses a similar way by looking at the other coefficients.

You really need a computer algebra system for questions like this.

0
On

Let $p_i$ be the probability of person $i$ saying yes, and let $q_k$ be the probability of exactly $k$ out of $n$ persons saying yes. Then,

$$ q_k = \left[\prod_{i=1}^n(1-p_i + p_ix)\right]_k, $$ where $[f(x)]_k$ is the coefficient of $x^k$ in the series expansion of $f(x)$.

In your question, $p_i = \frac{1}{i}$, and $k=2$. So,

$$ q_2 = \left[\prod_{i=2}^{n+1}\left(1-\frac{1}{i} + \frac{x}{i}\right)\right]_2. $$

One can find $q_2$ by taking the second derivate of the product with respect to $x$ and setting $x=0$, which gives \begin{align*} q_2 &= \sum_j\frac{1}{j}\sum_{i\neq j}\frac{1}{i}\prod_{l\neq i,j}\left(1-\frac{1}{l}\right) &=\frac{1}{n+1}\sum_{j=2}^{n+1}\sum_{i=2,i\neq j}^{n+1}\frac{1}{(i-1)(j-1)}. \end{align*}