Is method A really better than method B?

94 Views Asked by At

Suppose I have two methods, method A and method B. We evaluate the performance of these algorithms in 100 consecutive experiments with either positive (1) or negative outcome (0).

Method A achieves 84 positive outcomes and method B achieves 56 positive outcomes.

Is there measure that sort of tells us how sure we can be that A didn't perform better by chance?

My idea would be the following:

Fit a binomial distribution to the results produced in A (ML estimate). Than calculate the likelihood of the results of B being produced by that distribution ... although I don't really know how that is done.

Are there any suggestions to how to retrieve a really meaningful statement concerning whether A is not just by chance better than B?

Am I on th right track?

1

There are 1 best solutions below

0
On BEST ANSWER

Yes, right general track. But I don't think you need to derive the test using ML; most texts show a test with a normal and/or chi-squared test statistic. Can't be sure about those details without knowing more about what you have been studying lately. The Comment by @lulu suggests the two methods are not the same. I agree, here are two (similar) formal tests.

First, see test comparing 2 binomial proportions for the equations and theory. (Or look in your textbook.)

Second, here is Minitab output for your data, which indicates you should reject $H_0: p_A = p_B.$

Test and CI for Two Proportions 

Sample   X    N  Sample p
1       84  100  0.840000
2       56  100  0.560000

Difference = p (1) - p (2)
Estimate for difference:  0.28
95% CI for difference:  (0.159053, 0.400947)
Test for difference = 0 (vs ≠ 0):  Z = 4.32  P-Value = 0.000

From R statistical software, a somewhat similar chi-squared test looks is shown below, also rejecting the null hypothesis. (Your text may show this test instead of the one above, or addition to it.)

MAT = matrix(c(84, 56, 16, 44), nrow=2)
MAT
     [,1] [,2]
[1,]   84   16
[2,]   56   44

prop.test(MAT)

    2-sample test for equality of proportions with continuity correction

data:  MAT
X-squared = 17.357, df = 1, p-value = 3.097e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1490526 0.4109474
sample estimates:
prop 1 prop 2 
  0.84   0.56 

Note: $\sqrt{17.365} = 4.167133 \approx 4.32$ from Minitab. I think the main difference is that R uses a 'continuity correction'.