The question asks:
Each week you sample 100 people and ask them their opinion on a given subject. They either approve (1) or disapprove (0). We are interested in finding the average approval rate. Assume each person has the same probability to approve/disapprove (doesn't make sense in real life, but yeah)
I need to describe what my random variable would be here (i.e, the random variable for the avg approval rate). Then, I need to describe what distribution the variable would follow and why.
I am stuck between two choices for my random variable.
X could represent 0 or 1 (approval/disapproval) for each of the 100 individuals
X could range from 0 to 100; so, P(X = 50) would be the probability that 50 people approve.
I think X should follow a binomial distribution because across N independent trials with p probability, we're interested in finding the mean. It makes more sense to pick the first choice for my random variable in this case, but I'm not sure why. How would my model differ if I picked the second choice? In general, I'm confused about the relationship between the random variable and the model.
Thanks!
Between your two choices, the natural one is the binomial distribution. This is a classic example of a binomial distribution with $n = 100$ and $p$ some fixed probability.
You could choose your first option, but you would need 100 random variables to represent the different outcomes for each person in the survey. When you add those up to get the total number of "yes" responses, the distribution of the sum will be exactly the binomial distribution in option 2.
To be explicit: It doesn't "matter" which one you choose, but the binomial is the more natural option, since you will be dealing with it anyway after summing the Bernoulli variables.