Determine the distribution underlying several given scores and their percentiles

819 Views Asked by At

I am trying to build a FICO score calculator that estimates one's FICO score given one's percentile from another credit report. FICO score data is kept fairly secret, but the following information is publicly available for 2012:

  • A score of 750 had a percentile of 62.8%
  • A score of 800 had a percentile of 81.7%
  • A score of 850 had a percentile of 95.5%
  • The average score was 689.
  • The median score was 723.

I tried to guess the standard distribution using Z-scores for the known data in combination with Excel's NORMSDIST function. The "Percentile" row below is calculated with NORMSDIST while the "Actual" row reflects the percentiles listed above:

enter image description here

The problem I ran into is this: when I approximate the standard deviation needed to place one of the given data points in the correct percentile, this results in incorrect percentiles for the other given data. This indicates that the known score/percentile data do not fit the standard normal distribution. How do I find a distribution that fits the given data?

1

There are 1 best solutions below

2
On BEST ANSWER

The above result means that the standard normal distribution cannot fit adequately enough to your data. That is because the percentiles of your data do not match to the percentiles that you would have if the distribution was standard normal.

In order to determine the appropriate distribution for your data note that the mean is less than the median. This means that the distribution of your data has negative skewness. So, you should look for a negatively skewed continuous distribution. Moreover you should check (if possible) whether the underlying distribution can be assumed unimodal or bimodal. (Most distributions that are used are unimodal).

One standard example of a known distribution (with known I mean with a closed analytical form) with these characteristics is the Weibull distribution with shape parameter $>>1$ (for example equal to $30$ or more). A statistical package such as Minitab or SPSS (or Excel) can calculate the percentiles of the Weibull distribution for different values of the shape parameter and the scale parameter. For the shape parameter use a value of around $30$ or more (trial and error to find the exact one). For the scale parameter use a value around $750$. Actually for the scale parameter you should use the mode of the sample (if known) assuming that the distribution is unimodal. But due to the negative skewness the mode should be assumed a little bigger than the median ($723$) thus the suggested value of $750$.