I have a large set of data that shows things (let's call them cars) have performed over the years against various different tests (say, crash test, braking, gas mileage, etc.). What I'm doing is taking all the cars that went through a certain test, and dividing the number of test failures by that to obtain the likelyhood of failure of a test.
I am able to grab a subset of data from this set of cars and look at things like, how did all the red cars with eight cylinders do on these tests, or how did the yellow cars with four cylinders do on all tests, etc.
Naturally, as I start naming more features that I want to look at out of the dataset, the sample size of cars will get smaller and smaller. This will reduce the power of my failure rate estimate (failures/total_trials).
So, I have an ideal number of samples that need to go through a test in order to get a t-test power, significance level and effect-size sensitivity that I have deemed sufficient (or, at least acceptable). The next thing I did was subtract this value (say 50 observations will give us the power, sig level, etc.) from the number of samples that I have already tested. This would show the number of cars, by color, or cylinders, or whatever we seperate them into groups by, that need to go through a test in order to have an acceptable sample size that will give us the estimator power/sig level/effect size.
So, when picking cars and attributes of that car that should go into a certain test, I will choose cars that I have low samples of in order to increase the sample sizes of subgroups of cars that have a lacking sample size.
So far, this is all I've done. The problem is as follows: We're also interested in catching problems with the cars via the tests! The way I see it, if yellow cars fail a certain test more often than red cars, then I'd like to put yellow cars in the test, because after all, if yellow cars fail more often and they pass, I'd be happier about that result than if a red car passed. Red cars often pass. Even though a red car passed, I'd feel non the wiser about whether there's a problem or not.
But the problem is that if I have a low sample of red cars, my current decision making algorithm would say to load up on red cars simply because there aren't a lot of red car observations. I could see how this would be helpful or even optimal if the samples sizes were very different (if you have a low sample of red cars and a high sample of yellow cars, the red cars may very well be more likely to fail a test and we just don't have enough observations of red cars to know it yet), but if the sample sizes are close to the same, I would still have to choose to sample more of the least sampled cars.
I guess what I'm asking is, what is the best way to choose car factors to go into a certain test get a more well rounded data set as well as test cars with factors that are more likely to fail? Weighted sums? Something fancier?
Thanks for sticking with it to here. :)