Calculating p-value using chi-squared algorithm for an industrial process

79 Views Asked by At

I've had a problem assigned to me that ought to have gone to a statistician or process control expert, but it is what it is.

Gist of the problem: I need to calculate a p-value from product yield data "using chi-squared algorithm", but results seem to be internally inconsistent and I don't know why.

Clients are requesting that we calculate a limit for defects using a "chi-squared algorithm", with no more specificity than that. When the p-value exceeds a threshold, it should trigger additional monitoring for the process.

So I've given myself a crash course on chi-squared distributions with questionable success. I can calculate the p-values using two different methods and they agree, so that's great. But, internal consistency is broken when I compare p-values calculated with actual defect counts vs defect percentages.

For example, suppose I have the following defect counts, no-defect counts, and expected defects:

DEF -- NODEF | EXPECTED

20  --  980  |  15 -- 985

35  -- 1965  |  30 -- 1970
....

DEF    -- NODEF   | EXPECTED

2%     --  98%    |  1.5% -- 98.5%

1.75%  -- 98.25%  |  1.5% -- 98.5%
....

The p-values I get from calculating the values as shown (counts) and the values I get from calculating them as percentages of total lot size seem like they should be the same, since the input data is essentially the same, but they end up being completely different.

It seems to me that since the only thing changing is the input magnitude, there should be no difference. But, I'm wrong, either in my methods or my assumptions.

Can anyone put me on the right path for either?

1

There are 1 best solutions below

4
On BEST ANSWER

It's a $\chi^2$ test.

apllying it in your table you get

$$\chi_{(1)}^2=\frac{(20-15)^2}{15}+\frac{(980-985)^2}{985}+\frac{(35-30)^2}{30}+\frac{(1965-1970)^2}{1970}\approx 2.54$$

The Degree of Freedom are so calculated: $DoF=(k-1)\times(h-1)=(2-1)\times(2-1)=1$

where $k,h$ are the rows and column of the table, respectively.

now to know your p-value you need a calculator (often the paper table are not enough because they are discrete)

Up today, any calculator (I simply use Excel) has the major distribution included so with excel you get

DISTRIB.CHI(x; Degrees of Freedom) = DISTRIB.CHI(2.54;1) $\approx 11.11\%$

My Excel is in Italian, perhaps yours can have a slightly different formula name

The data you exposed in % are wrong, so the test cannot give the same result. If you want to get the same result of the previous test you have to calculate well the %. for example, the first cell in observed value is a of $\frac{20}{3000}=0.006667$

Do all the same calculation, at the end multiply the result by 3000 (total observations) and you get the same result, about 2.5381

PS: the chi squared test, by definition, has to be calculated with observation values, not % and the observatione per cell must be $\geq 5$