Short: With the standard curve (i.e. mean=0, std dev=1), does the top 10% form its own normal distribution with the expectation about 1.74 and the standard deviation about .40?
Long: I wrote a little program that takes the numbers 0.005,.015,.025,...,0.995, which is 100 numbers. I take the top 10 (.905-.995), find their z-score (1.31,1.37,...,2.58), sum them and divide by 10 and get 1.7447. In a similar manner, I go about finding the variance and get .1578, or a standard deviation of about .3972.
All seems well so far. I then take the lower 90 numbers and calculate the expectation and get -.194. This works because if I use the law of iterated expectations, the expectation turns out to be 0, which we expect from the standard curve.
I then use the law of total variance and get the unusual answer of 1.118, instead of 1. The expectation of the two variances is .658 and the variance of the two expectations is .460.
This doesn't seem so bad, after all I'm approximating with 100 numbers. So then I ran it again with 1000 numbers and then with 10,000 numbers. It seems to be converging on a variance of about 1.116, which I did not expect.
Any ideas on the discrepancy or a better way to figure this out?
TIA, Cary
You are correct that it is good enough to solve the problem for the standard normal distribution and then extend results to more general distributions from there.
The top 10% of a normal distribution cannot be normal. It is a right-skewed distribution. In the illustration below, it is the the part of the distribution to the right of the vertical red line.
You can use a simulation program to approximate the mean and variance of the values in a standard normal distribution above the 90th percentile, which is at $1.281552.$
However, you will need a very large simulated sample to get a good approximation. In the R program below, I take a sample of a million standard normal observations (vector
z), throw away the lower 90% and find the mean and standard deviation of the remaining 10% in the right tail of the distribution in the vectorx.The approximate mean is 1.76, the approximate variance is 0.171.
As a check on the simulation, I also found the median as 1.646, while the exact answer has to be the 95th percentile of standard normal, which is 1.645. Typical of many right-skewed distributions, notice that the mean is somewhat larger than the median.
Based on the 99,630 (approximately 100,000) retained observations, the simulation should be accurate to about two decimal places.
If this is an problem in a course, maybe you are supposed to use a method other than simulation to get exact answers. If you want to read more about this type of problem, this is called a 'truncated normal distribution'.
Addendum: Here is a method of numerical integration in R for the mean of this truncated normal that does not use simulation.
That is, if $f(z) = 10\varphi(z)$ for $z \in (1.281552, \infty),$ as in my Comment prompted by @Marco Bellocchi, then the code above provides an evaluation of $$\int_{1.281552}^\infty zf(z)\, dz = 1.754983.$$ So we see that the value 1.75691 above from simulation is correct within the margin of simulation error.
I suppose that mathematical software such as Matlab also does such numerical approximations of integrals.