How should I choose my sample size?

181 Views Asked by At

The following is only to make sure I don't have a misunderstanding. My real question is at the end.

I am in the following scenario: I have a website where people can subscribe by clicking a button. I can make the button either blue or red. In order to test what is better, I show some users the blue version and some users the red version. Now I count how many users subscribe for both groups.

My $H_0$: The color of the button has no effect on the probability of subscription.

I want to have a probability of at most $0.1\%$ that I falsely reject $H_0$, so my significance level is $\alpha = 0.001$.

Now I might decide to test this with 50 users in both groups and get the following result:

         subscribed          not subscribed |
"blue"   24                  26             |  50
"red"    23                  27             |  50
--------------------------------------------------
         47                  53             | 100

We have 100 random variables:

\begin{align} X_{i} \sim Bin(1, p) &\text{ with } i=1, \dots, n\\ Y_{j} \sim Bin(1, q) &\text{ with } j=1, \dots,m\\ H_0:&\qquad p =q \end{align}

The higher the observed difference $T(X_1, \dots, X_n, Y_1, \dots, Y_n):= |\frac{\sum X_i}{n} - \frac{\sum Y_j}{m}|$, the more reasonable is the assumption that $p \neq q$. So the test decision ("Testentscheid" - I'm not sure about the English word) should be:

$$ \begin{cases} T > c &H_0 \text{ is rejected on the significance level } \alpha\\ T \leq c & H_0 \text{ cannot be rejected on the significance level } \alpha \end{cases} $$

In case $p = q$, the probability to make a wrong decision is

\begin{align} &P(|\frac{\sum X_i}{n} - \frac{\sum Y_j}{m}| > c)\\ =& P(\frac{\sum X_i}{n} - \frac{\sum Y_j}{m} > c) + P(\frac{\sum X_i}{n} - \frac{\sum Y_j}{m} < -c) \end{align} (I'm currently stuck here, but I am relatively confident that this is the right way to calculate $c$ and thus make the decision on this significance level)

My Question

How do I decide which sample size to take? I guess with bigger sample sizes I can always make better statements, but usually it costs something to increase the sample size. So is 100 reasonable? 1000? I guess this mainly depends on the expected difference between $p$ and $q$?

Also: What should I do if I am sure there is a difference in this scenario, but would like to give a number/direction for it? I guess confidence intervals is the right term to look for?