What formula would I use (using Excel, preferably) to determine the 'average' 10th percentile rank value if I have four different and yet related 10th percentile values? Can I just average the values I have for four different 10th percentiles and use that number as the 'average percentile'? Most importantly, is there such a thing as an average percentile? Or, can I use a range of numbers in the same percentile to determine an 'overall percentile'?
I've written out really basic example of what I'm looking at in case the question itself was unclear. Note that I'm also using really simple numbers and I ONLY have percentile values and not the raw data used to assign the percentile values.
Category X
$50 = 10th percentile
$60 = 25th percentile
$70 = 50th percentile
Category Y
$80 = 10th percentile
$90 = 25th percentile
$100 = 50th percentile
Category Z
$110 = 10th percentile
$120 = 25th percentile
$130 = 50th percentile
I need to figure out the percentile rank for X number. The rank must take into account all categories and since the purpose of this work project is to make the data simple, I need to assign X an overall percentile rank.
At this point I am thinking that for each survey, you could interpolate any missing percentiles, then take a weighted average of given and interpolated values for each percentile to get percentile estimates for the mixture of all the surveys. The interpolation part will take some work but here is a link that describes the process and gives an example. Try doing this by hand with your own data to develop an understanding of the calculations before trying to code anything in Excel.
A more elaborate process is based on simulated salary data. You could fit statistical distributions to each of your samples, then draw new samples from those fits, in proportion to your original sample sizes. There are various strategies for doing this. The new sample points would represent your best guess of actual salaries. Combine the new samples together and get your percentiles directly from the combined simulated data. This approach would require the assistance of a statistician to develop the distribution estimates and possibly a programmer if you want to put something like this in production.
If you only have to do this once the interpolation method is the easiest way to go and Excel might be sufficient. If you have to do this repeatedly, and you have a lot of surveys to work with, it might be worth it to invest in the more elaborate approach. I'm not sure Excel would make the best platform in that case.
If I can come up with any additional ideas I will update my answer or maybe another reader can suggest improvements.