Calculating a sample's representativeness to confirm/refute a given hypothesis?

111 Views Asked by At

Why hello! I'm fairly new to statistics, which is why I'm somewhat confused as to how I can approach this problem in a scientific way.

The problem: Experiments are conducted to find the probabilities for several possible outcomes; let's say they are $X_1$ to $X_4$. There is no other outcome, so the "combined" probabilities must amount to $P = 1$ ($100\%$). It is, however, unknown how exactly this grand total is divided... It might be an equal $25\%$ or a random percentage between $0.01$ and $0.96$ for each.

The goal is to ultimately find out with a confidence level ($1\%, 3\%, 5\%$... doesn't matter) how the probabilities may be distributed among $X_1$ to $X_4$.

The hypothesis: It is thought that $X_1$ to $X_4$ have an equal probability of $25\%$ each. This is most likely wrong, but the hypothesis remains to be refuted by means of "proving" a different probability for each by observing a sufficiently large sample (the experiment mentioned above, which can be repeated indefinitely if need be).

Where I require your aid: Well... I have read up on everything I thought relevant to the problem, but I remain uncertain how to "formalize" everything. The calculations should be the lesser problem afterwards. When exactly will the observation sample size be large enough for a given confidence level? What kind of "mean" can I calculate from the observation if the individual events are different results and not simple numbers?

Say, $X_1$ -> "The computer catches fire" and $X_2$ -> "You win tomorrow's lottery" (random examples). They're not related to each other (such as $X_1 \rightarrow 1/\text{Heads}$, $X2 \rightarrow 0/\text{Tails}$), so I fail to see how I can apply the formulae available for mean, deviation, error and all the other possible statistical quantities I read about.

2

There are 2 best solutions below

0
On

A couple preliminary clarifications/remarks on your problem:

  1. From your wording, it appears that these probabilities are not only exhaustive but also mutually exclusive, so that one, and only one, outcome can occur in each experimental trial, correct?

  2. What kind of confidence level do you want? I assume you want a set of family-wise confidence intervals, so that there is a, say, 95% confidence that all of the intervals cover the true probability of each occurrence.

I will assume the above in my answer.

When you are testing multiple outcomes that are related in some way (here, they are exhaustive so they add to 1) then you want to use the categorical distribution, which is related to the multinomial distribution not the normal approximation or the t-distribution for single-outcome statistics, as you have indicated. You are trying to force-fit a multi-dimensional problem into a one-dimensional testing scheme. The multinomial will give you the needed flexibility. As a basic primer on categorical data, see this.

To do a hypothesis test of equality of probabilities, you can use the Chi-square goodness of fit test, although it relies on a multivariate normal approximation to the set of estimated probabilities you get from the experiments. To get confidence intervals, see this.

I think the links and above explanations will get you largely where you want to go.

0
On

Assumption 1 is correct; only one event may happen and exactly one event is always happening for any given experimental trial.

Assumption 2 I took to mean that this set of confidence intervals, containing a confidence interval for all of the possible outcomes, would then allow one to say with the required certainty (e.g., 95%) that ALL estimated probabilities are indeed as close to the true probability (again, for each event) as was required. If so, this also is true, but from what I read the "certainty" only means all values are included in said interval, i.e., 95% of all values would be in the interval. However, how does this enable one to say how close to the truth the probabilities are?

For what it's worth, I understand that I have to use a different model, and I'd now start like this:

Let X_i be the (categorical) random variable that represents the outcome of each trial 0 < i < N+1, where N is the number of trials performed.

That is, X_i assumes the value of events 1 to 4 (in the example above), where i=4 and N, the number of trials performed, and there are N values X_i that represent the whole sample data. Since X can assume only one of the 4 possible outcomes the property "mutually exclusive" is thusly included in my "model".

To rephrase the main problem, let me quote Wikipedia:

Certain factors may affect the confidence interval size including size of sample, level of confidence, and population variability. A larger sample size normally will lead to a better estimate of the population parameter.

Now what I am interested in is, how large should the sample size be to be 90%, 95%, ..., sure that my estimated probabilities for all events are "correct" and the error is small enough to have certainty regarding the calculated percentages? I'm also not sure how to set the margin of error (usually called 'e' in the formulae), apparently a value of 0.1 is small enough?

To use real sample data, this might be the outcomes of N=300 trials: X_1 : 105 times X_2 : 105 times X_3 : 67 times (<-- This and... X_4 : 23 times (<-- ... this makes it, in my opinion, very unlikely that all events should have an equal probability of 25, especially considering the sample size. But how can I be "sure"?)

The sample mean of X calculates to ~2 with an estimated sample variance of 0.88. Now, taking z = 2 for a 95% confidence level, it appears with an error margin of 0.1 (to each side of the interval), a very rough estimate of the required number of trials would be:

n >= ((2*0.88)/0.1)^2 =~ 300 This is only a very rough estimate, but wouldn't it mean that a sample size of more or less 300 is already large enough to be sufficiently (95%!?) sure that the calculated probability for each event is indeed within the confidence interval of e = 0.1 ?

But even then, I don't see why exactly, and I also think the error is only referring to X, which however does not represent any percentage, but the average (mean) value of ~2. That is, the mean of an infinite amount of trials should amount to something in the interval [~1.9 ; ~2.1] and we can say that and only that with the required certainty of 95%?

It doesn't really solve the problem, from what I understand.

PS. Since I did not calculate multiple confidence intervals, I believe all of this must be horribly wrong. I apologize for not understanding the "advanced" links you posted about them. The test of goodness was only required to disprove the hypothesis that 0.25% = p_i for all events i, was it not?

If so, I was hoping I could simply "disprove" it by stating with close-to certainty that any other percentage that is very much different from an equal share is the true probability (indirectly, that is).