Expected Members of a Randomly Selected Subset

265 Views Asked by At

I have a set $A$ of size $n$, and want to construct a random subset $S \subseteq A$. Each element of $A$ has a probability of being chosen to be in $S$, $\forall_{a \in A} \,\; p_a = Prob(a \in S)$.

Is there an existing notion for the expected composition of $S$?

For example, suppose $A = \{0.25, 0.75\}$ (with members labeled by their probability of being chosen). My intuition is that $E[S] = \{0.75\}$. I get this by finding the expected number of elements of $S$, $E[|S|] = 1$, and then guessing that the highest probability member will be the expected choice.

Is there any notion that formalizes something like my intuition?

Thanks for any help,

Federico

1

There are 1 best solutions below

0
On BEST ANSWER

I don't think there's really a definitive answer to your question, as others have stated in the comments. However, what you are talking about may be related to the notion of Maximum Likelihood Estimators in the context of classification (and other applied statistical fields including machine learning). Of course here we are not figuring the most likely parameters, but rather the most likely set. This is a technique used commonly in Naive Bayes, and the like, because we figure out the most likely label a certain instance of data may have.

There is no conventionally defined "expected value" function on such a process, because expected value of a random variable specifically refers to random variables that are $\Omega \mapsto \mathbb{R}$ (conventionally, of course. There will be variants). However, if your resultant of the process is a set, naturally there is no conventional notion of how to multiply sets by a probability, or add them like in the expected value function we are familiar with.

Back to my point on MLE; you may be trying to find out a "most likely set", if you will. In applications such as machine learning, you have some probabilities and are trying to figure out the most likely labels for a certain collection of data. Of course, depending on the application there are numerous ways of doing this. For more on MLE, there's always the wikipedia course, or several edX resources available online.

For example, you can select an element to be in the set (assuming all their probabilities are independent) if it has greater than 0.5 probability, or rather select the expected value of elements with greatest probability to be in the set (see comments).

Hope this helps you gain some kind of idea as to what your idea means. Not really a math thing, just a kind of design thing (as these sets usually have meaning).