How can I calculate accuracy of an estimate based on a subset of data?

968 Views Asked by At

For example, say I'm trying to determine the average number of steps taken in a day for a population of 100M people. I know the exact number of steps for a random 25M subset of the 100M group (2500 per day).

How accurate is my 2500 steps per day number when applied to the whole 100M person group? Assume the Standard Deviation of the 25M sample is 20k.

Say I only had data for 100 or 1000 people of the 100M group. How does the accuracy change?