Estimating confidence interval based on sample where multiple replies are possible

27 Views Asked by At

I’m conducting a study of student dietary preference. In particular I want to know how many meals are consumed on campus, and of those how many are healthy, how many included vegetables, soft drinks and so on. Suppose the student body is 10 000 and I pull a random sample of 200 students. Each respondent tells what he or she ate yesterday, which boils down to something like this table:

    Meal 1                             Meal 2
Id item 1    item 2   item 3  drink    item 1    item 2   item 3  drink
1  Salad     None     None    coffe    None      None     None    None
2  Burger   Fries     None    coke     Pizza     None     None    coke

I then, for convenience sake, rearrange this data so each row represents a meal, so it becomes something like

Id   meal_num item 1    item 2   item 3  drink 
1    1        Salad     None     None    coffe    
2    1        Burger    Fries    None    coke
2    2        Pizza     None     None    coke

This form makes it much easier to count the number of meals with desired properties. For instance, I can say that 21% of meals included salad, 64% were complimented by soft drink, etc.
From the first table I can easily estimate the number of meals consumed on campus and the confidence interval for that estimate: I can find variance for the number of meals eaten by each respondent, chose appropriate distribution and so on.
What I don’t know is how to find confidence interval for the estimated proportion of meals that I’m interested in. Is it OK to take the second table as a sample in itself (where sample size is the total meals consumed by respondents)

1

There are 1 best solutions below

4
On

As I understand your explanation, the second table does not represent a random sample of meals because two meals by the same subject are not independent. If a subject has one 'healthy' meal, is he/she more or less likely to have a second 'healthy' meal? The answer is hard to know, but it is easy to see that two meals by the same subject aren't independent.

If you are most interested in meals on campus, then ask each subject about most recent meal on campus. Or ask each subject about his/her main meal on campus yesterday. If you are interested in lunches on campus, than ask each subject about yesterday's lunch on campus, if any.

If you are most interested in the health of subjects eating on campus, then consider whether all meals on campus yesterday have a 'healthy' profile when taken together. It is unclear whether subjects will eat off campus also, and how or whether you want to take that into account.


The first step in such a survey should be to decide exactly what you want to know. What are the appropriate experimental units (people or meals), and how do you get a meaningful random sample of experimental units.

Also, you need to consider how you can get accurate information on the experimental units. (How many people will remember exactly what they had for lunch yesterday? What counts as 'lunch'? Does the ice cream bar after lunch on the way to class count? Does the double espresso while waiting for friends to join for lunch count? If they do remember, are people willing to give an honest answer? Or will they try to 'improve' the quality of their meal. Or will they tell you what they think you want to hear?)

Now that you have obtained data, possibly without such advance planning, try to think how you can use most or part of your data in a way that might lead to a meaningful answer.