Finding a percentile in Excel

493 Views Asked by At

I am writing a program using excel and I need to find the percentage of a number. I have a few sample numbers and their percentages.

  • 2.3% - 5.013368
  • 5.0% - 5.238858
  • 10.0% - 5.480078
  • 25.0% - 5.908320
  • 50.0% - 6.423700
  • 75.0% - 6.984281
  • 90.0% - 7.530756
  • 95.0% - 7.878150
  • 97.7% - 8.233311

I need a formula that I can put into excel with a new number (ex. 6.544) and it returns the correct percentage (ex. 62.45%).

I don't really know the math terms very well and I do apologize. I hope I was able to describe my problem well. And also I don't know what all the math symbols are. I tried to look this up but it just kept giving me math equations that I can't follow.

1

There are 1 best solutions below

2
On

Well you might want to first plot the percentages you already have and see what it looks like. Does it seem linear ? Maybe exponential ?

If it looks linear (or almost linear), a simple linear regression will do the trick (http://en.wikipedia.org/wiki/Linear_regression). If it has another form, then you might need some more "complicated" functions for your regression to fit your data points. See non-linear regressions.

Edit: If what you plotted has an "S" shape as you said, then you might want to try a sigmoid function (http://en.wikipedia.org/wiki/Sigmoid_function)

You could either work with percentages as numbers between $0$ and $1$ and then use this form:

$$f(x)=\frac{1}{1+e^{-x/T}}$$

Or you want to work directly with your numbers from $0$ to $100$ and try this form instead:

$$f(x)=\frac{100}{1+e^{-x/T}}$$

The $T$ is a constant you can use to tweak the function and make it fit your data.

(By the way, the logistic function the Wikipedia articles talks about is just the special case $T=1$)

If you work with excel, you can write the function one of these ways (depending on which of the two forms you chose to use):

1/(1+EXP(-CX/CT)) or 100/(1+EXP(-CX/CT))

In this formula, replace the CX by the cells where you have your numbers and the CT by the cell where you put your parameter $T$.

Now for the method you should use to fit your data, you have several choices. I would recommend two depending on what maths you're willing to do.

  • First method:

Tweak the parameter $T$ manually until your functions looks like your data (can often be time intensive and repetitive but yet doesn't require any further knowledge)

  • Second method: The "Least squares".

The idea is to minimize the average squared distance from the data points to the predicted points (the function's curve). This image (found on the web) makes it easier to understand graphically (imo):

enter image description here

Don't pay attention to the shape of the curve, it's just here to illustrate the method ;)

  • The pairs $(x_1,y_1)$, $(x_2,y_2)$ all the way up to $(x_n,y_n)$ are the pairs $(number,percentage)$ you have (seems like you have $9$ of them so in your case $n=9$)
  • The black curve is the plotted function you are looking for (the sigmoid in your case).
  • The values $D_1$, $D_2$, ..., $D_n$ are the distances from the points to this curve (the error you make in a sense)

$$D_i=y_i-f(x_i)$$

Now what you want is take the average of these distances and change the parameter $T$ to make this average as small as possible (i.e: the curve as close as possible to the points)

However if you do this, the negative errors and the positive ones might just cancel each other out and you will get an average distance of $0$ although your curve could be pretty far from all your points. This is why we take the squared values of these distances to compute the average (and also because it emphasizes the big distances compared to the smaller ones).

So your average can be written as:

$$\displaystyle D_{avg}=\frac{1}{n}\sum_{i=1}^n D_i^2=\frac{1}{n}\sum_{i=1}^n (x_i-f(x_i))^2$$

Which, if you write $f(x)$ explicitly, becomes:

$$\large\displaystyle D_{avg}=\frac{1}{n}\sum_{i=1}^n (x_i-\frac{100}{1+e^{-x_i/T}})^2$$

The idea now is to find the value of $T$ for which this $D_{avg}$ is minimum. Try to google "Least squares excel" to see how to do this in excel ;)