More "formal" description for an experiment

57 Views Asked by At

I have an experiment with $n$ elements. Each one have a different probability $p_i \in [0,1]$ of being chosen (all the probabilities can sum more than 1). Finally, $m$ of these $n$ elements are chosen based on their probability.

Question: I would like give a more formal description of this experiment, also using some "technical" words in the field of statistics & probability. Is that possible?

Explanation of the process: To do this experiment, I'm employing the sample function in R, using its probargument, which indicates the weight of each element to be chosen. I don't know exactly how it works. I've only read that "probabilities are applied sequentially, that is the probability of choosing the next item is proportional to the weights amongst the remaining items.

I'm not sure of the exact algorithm used in sample, but I imagine that the process of the samplefunction using weights is something similar to this:

  • Elements are picked one by one, randomly, until elements are chosen.
  • I pick an element and check if it finally chosen based on its probability.
  • If it is chosen, it is removed from the list of elements, avoiding it is chosen again.
  • If not, I pick another random element and repeat the process.
1

There are 1 best solutions below

1
On

This is successive sampling (by default without replacement) with weights; the weights can add up to $1$ but do not need to; any positive numbers will work. Without replacement it is related to the ideas behind the Wallenius' noncentral hypergeometric distribution. ?sample will show you the documentation

As an illustration, consider sample(c("A","B","C"),size=2,prob=c(2,1,3)) which will give a weighted sample of two of $A,B,C$ using the relative weights $2,1,3$:

  • The first item may be $A$ with probability $\frac{2}{2+1+3}=\frac13$
    • The second item may then be $B$ with conditional probability $\frac{1}{1+3}=\frac14$, making the joint probability of $(A,B)$ $\frac1{12}$
    • The second item may then be $C$ with conditional probability $\frac{3}{1+3}=\frac34$, making the joint probability of $(A,C)$ $\frac1{4}$
  • The first item may be $B$ with probability $\frac{1}{2+1+3}=\frac16$
    • The second item may then be $A$ with conditional probability $\frac{2}{2+3}=\frac25$, making the joint probability of $(B,A)$ $\frac1{15}$
    • The second item may then be $C$ with conditional probability $\frac{3}{2+3}=\frac35$, making the joint probability of $(B,C)$ $\frac1{10}$
  • The first item may be $C$ with probability $\frac{3}{2+1+3}=\frac12$
    • The second item may then be $A$ with conditional probability $\frac{2}{2+1}=\frac23$, making the joint probability of $(C,A)$ $\frac1{3}$
    • The second item may then be $B$ with conditional probability $\frac{1}{2+1}=\frac13$, making the joint probability of $(C,B)$ $\frac1{6}$

If you are not interested in the outcome of the order then the joint probabilities (adding up to $1$) are:

  • the probability of $\{A,B\}$ being the sample is $\frac1{12}+\frac{1}{15}=\frac3{20}$
  • the probability of $\{A,C\}$ being the sample is $\frac1{4}+\frac{1}{3}=\frac7{12}$
  • the probability of $\{B,C\}$ being the sample is $\frac1{10}+\frac{1}{6}=\frac4{15}$

The probability of individual items being sampled (adding up to $2$ since two items are sampled) are:

  • For $A$: $\frac3{20}+\frac7{12}=\frac{11}{15}\approx 0.733$
  • For $B$: $\frac3{20}+\frac4{15}=\frac{5}{12}\approx 0.417$
  • For $C$: $\frac7{12}+\frac4{15}=\frac{17}{20}=0.85$

Simulation confirms this interpretation, for example with

set.seed(2022)
sims <- replicate(10000,sample(c("A","B","C"),size=2,prob=c(2,1,3)))
table(sims)
# sims
#   A    B    C 
# 7345 4171 8484