Estimate counts with different sample sizes

88 Views Asked by At

Given an arbitrary time period, lets say one week, but it could be five days, one month etc.., I have a sample from a population. My sample consists of shoppers at a store. For week one my sample is 1000. My sample consists of 880 people I see once, 85 people I see twice, 30 people I see three times, and 5 people I see four times.

For simplicity, suppose I have four more samples, each of a different size with a different distribution of counts. For example,

Week 1  sample size 1000
One time     Two times    Three times     Four times
 880           85           30               5

Week 2  sample size 1170
One time     Two times   Three times     Four times
 990          103           70              7

Week 3  sample size 1300
One time     Two times   Three times     Four times
1155          145           90             10

Week 4  sample size 965
One time     Two times   Three times     Four times
 840           82           25              8

Week 5  sample size 1325
One time     Two times   Three times     Four times
1120            115         79              11

Is there a mathematical way to put the counts from weeks 2,3,4,5 on the same scale as week 1, i.e., a way to estimate how many one time, two time, three time, and four time visitors I would have seen if the sample size was 1000 instead on 1170, 1300, 965, and 1325 respectively.

I have a large amount of data if many samples are needed to construct estimates.

1

There are 1 best solutions below

0
On BEST ANSWER

When you say you have a sample of $1000$, did you ask $1000$ different people how many times they visit the store or did you look at a bunch of people in the store and find that you have $1000$ unique individuals? If you look at a bunch of people, it might be better to call the week $1$ data a sample of $1 \cdot 880+2 \cdot 85+3 \cdot 30+4 \cdot 5=1160$ observations. The most naive approach would be to assume that on each observation you get a random person who visits the store that week and that each observation is independent. In that case you would expect a Poisson distribution. You could then try to fit $\lambda$, the average number of times you see each person. Your data falls off too slowly to fit, which is not surprising. There are probably some people who visit the store many times in the week, and they are the ones you see four times. You could try to use your data to estimate the variation in number of visits. It seems curious that the three times numbers are so low in weeks 1 and 4 compared to the rest.