Probability that the Median is Below a Certain Number?

190 Views Asked by At

Suppose there is a university with 10,000 students - I visit this university, randomly select the 150 students and measure their heights. I find that the average height of all these students is 168 cm.

In Statistics and Probability, a popular question is to know whether we can use the average height of this sample of 150 students to estimate the average height of all students within the university. For example, we can use concepts such the "Central Limit Theorem" (https://en.wikipedia.org/wiki/Central_limit_theorem) and the "Weak Law of Large Numbers" (https://en.wikipedia.org/wiki/Law_of_large_numbers). Provided the sample size is large enough and the students were randomly selected, based on these concepts, we can make statements such as "based on our sample, on average there is a 0.95 probability that 168 cm ± 3 cm contains the true average height of all university students".

Suppose now I look at the sample data I collected and notice the following:

  • 43 % of all students had a height between 150 cm and 161 cm
  • 28 % of all students had a height greater than 170 cm

My Question: Just as I was previously able to make inferences about the average height of all students based on my sample data using "Central Limit Theorem" - can I also use the "Central Limit Theorem" to make inferences whether:

  • In the population of all students, 43 % of students will have a height between 150 cm and 161 cm?
  • In the population of all students 28 % of all students will have a height greater than 170 cm?

Initially, two thoughts come to mind:

  • As far as I understand, the "Central Limit Theorem" applies to all probability distributions - but only applies to the "expectation" (i.e. average) of probability distributions. The quantities I am interested in making inferences on (i.e. 43 % of all students had a height between 150 cm and 161 cm; 28 % of all students had a height greater than 170 cm) are not considered as "averages". Therefore, I think that the "Central Limit Theorem" will not apply to these quantities.

  • I was reading about the "Bootstrap Method" (https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) and this might be applicable to my question. For example, of the 150 students I sampled - I could randomly sample 150 students with replacement and calculate the percentage of students with heights greater than 170 cm. I could then repeat this random resampling process many times and create a "bell-curve" of the percentage of students with heights greater than 170 cm in each random sample (e.g. sample 1 = 21 %, sample 2 = 25%, sample 3 = 19 %... sample n = 24%). If I take the average and standard deviation of all these percentages, I could now make a statement such as " on average, I estimate that 22.7% ± 1.8% of all students within the population have a height greater than 170 cm". This in theory might allow me to partly answer these questions of interest I have described earlier.

Can someone please provide comments on this - is my understanding of the above correct? Is there a more "theoretical" approach (i.e. not involving simulation) to answer such questions?

Thanks!