Splitting up a series of numbers: How to determine when a value increase is significant?

100 Views Asked by At

I have 100 input values:

$0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 8, 8, 10, 16, 18, 22, 22, 35, 50, 50$

Each value represents a response time, i.e. the number of minutes it took for some customer service agent to respond to an email from a customer. So the first value $0$ indicates that the customer only waited $0$ minutes for a response.

I need a formula to find out how many fast, medium-fast and slow response time instances there is. In other words, I want to cut my input values up in $3$ pools, and then count how many there are in each pool.

The complicating factor is that I based on the overall slope steepness have to figure out where to make the cuts. There is no fixed definition of fast, medium-fast and slow. The first cut (between fast and medium-fast) should occur where the steepness of the slope starts to increase more drastically than before. The second cut (between medium-fast and slow) should occur when an even more dramatic steepness increase occur.

Here is a graphical representation of the input values.

In the above example, common sense would probably define fast as $0$-$3$, because there are many instances of $0$, $1$, $2$, and $3$. $4$-$8$ or $4$-$10$ looks like common sense choices for medium-fast. But how to determine something like this mathematically? If the response times were generally faster, then the customers would be expecting this, so an even smaller increase towards the end should trigger the cut.

Please note that my use case don't allow me to use math programs like R. I need a formula that uses basic math. The end result of the formula should not be the number of values in each pool, but rather the cut-off values (e.g. $3$ and $10$).

1

There are 1 best solutions below

0
On BEST ANSWER

You may want to look at clustering. As you noted, defining what is "medium" or "slow" is fuzzy, and there are different ways to approach this; correspondingly, there are various metrics for defining goodness of clusters.

A start might be $k$-means clustering. In your example, your data is one-dimensional, and $k=3$.


Note it might be tedious to do these algorithms by hand for any given dataset. I am suggesting these just in case you want to explore the problem further.