I have a theoretical question.
Given a set of data, whose 95th percentile is X:
If I normalize the data, doing zscore normalization, i.e. (data-mean)/std, is the 95th percentile of the normalized set equal to
- (X-mean)/std
- 1.95
- I cannot say a-priori, and need to assess the 95th percentile again
- Other?
thank you
I think I know what you're asking, but not 100% sure. Let me show you an example, and maybe we can discuss it from there. I'm assuming that you're dealing with a sample from a normal population.
Original sample. Here is a sample of size $n = 20$ sampled at random from the normal population with mean $\mu = 100$ and standard deviation $\sigma = 15,$ rounded to two places and sorted from smallest to largest.
The 95th percentile of these data should lie between observations 47 and 48 on this list. That is, between 122.57 and 124.52. The 95th percentile as found by R is about 123.64. (Different software programs have slightly different ways of finding percentiles, so don't worry about the exact value.)
Original population. Now, if I look for th 95th percentile of the distribution $\mathsf{Norm}(\mu = 100, \sigma = 15),$ I get 124.67. Just as the mean of the sample is $98.50 \approx 100$ and the sample standard deviation of the sample is $16.48 \approx 15,$ the 95th percentile of the sample $123.64 \approx 124.67,$ but none of the sample statistics exactly matches its corresponding population parameter. (If I had taken a sample larger then 50, the matches would tend to be better.)
Here is a histogram of the sample along with the density function of the population. The 95th percentiles are shown as a blue line for the sample and a red line for the population. Sample and population 95th percentiles are close to each other but not exactly the same.
Standardized sample. Now if I standardize these data (that's what you're calling 'normalize'), by subtracting the sample mean and dividing by the sample standard deviation (and round again for manageable numbers), I get the following list:
In, particular, the standardized version of th 95th percentile of the x's is $x = (123.64 - a)/s \approx 1.52.$
If I take the 95th percentile of the z's, I get about 1.52.
Standard normal distribution. Also, if I find the 95th percentile of the standard normal distribution (approximately the standardized version of the population distribution), I get 1.64. That's close to the sample value 1.52, but not exactly, because samples don't exactly emulate the populations from which they are chosen.
Here is a corresponding figure for the standardized sample and population density. The sample and population 95th quantiles are not (relatively) quite so close together now because sample estimates of $\mu$ and $\sigma$ were used to standardize the sample.
Summary. So depending on which 95th percentile you're talking about in your question, the answer might be essentially an exact match (comparing original and standardized sample, or comparing original and standardized distributions) or it might be only a rough approximation (comparing sample 95Th percentile with population 95th percentile).