How do I compare rates of error between two different sample sizes?

2k Views Asked by At

I'm unsure on how to normalize for two different variables.

Person A makes 20 pastries total, whereas Person B makes 50.

5 of those pastries, so 25%, are sampled from Person A; 10 for Person B, for a sample of 20%.

The pastry chef determines from the samples that 2 of Person A's pastries are subpar, compared to 5 for Person B.

Therefore the chef interpolates that 50% of Person B's pastries are subpar to standards, compared to 40% for Person A. But that seems like shallow reasoning, since Person B's made at least twice more pastries than A.

Thus, how do I normalize to compare Person A and Person B taking into account sampling size and rate of error?

2

There are 2 best solutions below

2
On BEST ANSWER

You are asking to compare two rates.

The error rate of $A$ was measured to be $2$ out of $5$, i.e. $40\%$.

The error rate of $B$ was measured to be $5$ out of $10$, i.e. $50\%$.

That's it. Neither the sample size nor the total production size do influence these ratios and there is no need/possibility to normalize.

The only difference size can make is about the dispersion of the results (variance), which is larger for a smaller sample, i.e. giving a less accurate estimate of the mean.

4
On

You say you want to $compare$ the performances of A and B. Then you need to do a test of the null hypothesis $H_0: \pi_A = \pi_B$ against $H_a: \pi_A \ne \pi_B,$ where $\pi_A$ and $\pi_B$ are the true error rates for A and B, respectively.

Your small samples. Two kinds of tests are in common use to judge whether two such proportions differ. One assumes the the two binomial proportions are nearly normally distributed. That is not a safe assumption with so little data. Here are results from Minitab.

  > PTwo 5 2 10 5.

  Test and CI for Two Proportions 

  Sample  X   N  Sample p
  1       2   5  0.400000
  2       5  10  0.500000

  Difference = p (1) - p (2)
  Estimate for difference:  -0.1
  95% CI for difference:  (-0.629553, 0.429553)
  Test for difference = 0 (vs not = 0):  
     Z = -0.37  P-Value = 0.711

  * NOTE * The normal approximation may be 
  inaccurate for small samples.

The second test is 'Fisher's Exact Test'. It is based on the hypergeometric distribution. Minitab's procedure for comparing proportions also does that test. Results are below:

 Fisher's exact test: P-Value = 1.00

So, apparently, you do not have enough data to say that A's 40% error rate is meaningfully different from B's 50% error rate.

However, if you are interested more generally in methods to compare two such proportions, please look in you statistics text for details of these two tests.

Hypothetical large samples. In particular, suppose you have about 100 times as much data for each individual, and with similar error rates. Then here are results from Minitab:

 MTB > PTwo 500 201 1000 498.

 Test and CI for Two Proportions 

 Sample    X     N  Sample p
 1       201   500  0.402000
 2       498  1000  0.498000

 Difference = p (1) - p (2)
 Estimate for difference:  -0.096
 95% CI for difference:  (-0.148984, -0.0430161)
 Test for difference = 0 (vs not = 0):  
     Z = -3.55  P-Value = 0.000

 Fisher's exact test: P-Value = 0.000

With this increased amount of data, we would have plenty of evidence to judge the two error rates different. It is not simply the size of the difference that is persuasive, but the amount of evidence to make sure the rates (still about 40% and 50%) are reliable.