Dividing data of a given distribution

508 Views Asked by At

Having a dataset with distribution $p$ (e.g. uniform or normal), if we divide the dataset into $n$ parts with equal size, is it valid to say that each part still has distribution $p$?

1

There are 1 best solutions below

4
On

I don't really think so. The question here is how you divide the dataset? Theoratically, if you randomly select a sample, of say 50, by Central Limit Theorem, the distribution of the sample follows a normal distribution, assuming you select randomly from the population.

Consider that I have 100 data points of height. These represent the height of 100 students. Let's also consider that they are left-skewed, so the tail is on the left, meaning that a small proportion of students are short.

Now, you want to divide them into 5 parts equally of 20 students. If you pick the last 20 (tallest) students, you can see that the distribution might NOT be left-skewed. It might be uniform (all of them have height 170cm). Clearly this part does not reflect the original distribution.

Hope this helps.