compare two random sampling approaches

133 Views Asked by At

I want to understand if samples obtained from the following two approaches are statistically equivalent.

Situation: I have $N$ items of different weights. A filter is designed to filter items above certain weight. After going through the filter with the entire $N$ items, $Z$ objects are left.

Approach 1: I randomly take $N'$ objects out of the original $N$ objects, and send them to the weight filter. $n$ objects are left after the filter. Say this is sample 1.

Approach 2: I randomly take $n$ samples from the filtered $Z$ objects. say this is sample 2

I think sample 1 and 2 are the same (not sure), but cannot figure out the math. Can any body help please?

Some context on why I try to do it: the ultimate goal is to estimate some properties of the filtered objects (population of $Z$) using sampling. But filtering process to get all $Z$ objects is expensive. If we can only testing on a smaller population (approach 1), we can save time and money.

Thanks.

edit: change the filter method.

2

There are 2 best solutions below

2
On BEST ANSWER

Assigning values to the variables may make it clearer. Suppose there are 1000 objects, and that 60% of them will be cut off by the filter.

Approach 1: You take a random sample of 25 objects, which are (hopefully) representative of the 1000 objects, so that about 60% of them will be cut off by the filter. Then you will be left with 10 objects.

Approach 2: Applying the filter first leaves 400 objects. If you then take a random sample of 10 objects, the sample will be representative of the 400 filtered objects.

To be statistically equivalent, an object must have an equal chance of being selected in the sample regardless of the method. Suppose an object has a weight that will not be cut off by the filter. In approach 1, it has a 25/1000 chance of being selected, whereas in approach 2 it has a 1/400 chance of being selected. These are equal, and thus the methods are equivalent.

0
On

As long as your sample in version 1 is random over all $N$ items and you choose items and test them until you have $n$ that pass the filter, you are fine. Version 1 is just like Version 2 except you don't test the non-selected items. Depending on your protocol, it might also be convenient to test a batch of items that gives a high probability of passing more than $n$, then select $n$ from those that pass.