How do I find if a series of integers is (somehow) regular

42 Views Asked by At

[DISCLAMIER]: please be patient and low profile: no math background here :)

I have a database of transactions, i.e. a list of purchases made by different customers.

What I have to figure out is which customers buy more regularly and which not, a numeric indicator for each customer telling his reliability — sort of the lower the value, the higher the probability he will purchase again in his usual time frame, and vice versa.

Dates are stored as integers counting days since 01/01/1900. For sake of simplicity let's assume all transactions are from year 1900. Actual numbers are all greater than 40.000 — year 2010 onwards.

Let's assume these 3 ultra-regular customers purchase dates:

1) 0, 7, 14, 21, 28
2) 0, 14, 28, 42, 56, 70
3) 0, 1, 28, 29, 56, 57

Customer 1 has a purchase each week, customer 2 has 1 purchase each 2 weeks, customer 3 has 2 purchases every 2 weeks.

Now let's compare with this very-occasional, very unregular customers:

4) 0, 12, 15, 80, 88, 99
5) 0, 20, 22, 48, 88, 90
6) 0, 1, 2, 29, 86, 87

I'd expect a very low value — 0? — for the first 3 customers and higher values for the last 3.

How do I calculate this indicator using standard excel-available formulas? (am I asking too much???)

1

There are 1 best solutions below

1
On

There are many, many measures for such things but the question is, which of them is applicable to your model? Here's one take on this. Focus on a particular customer $C$. You have a sequence of times at which they buy something. One extremely unsophisticated way of modeling their buying is via something akin to a random waiting time: Suppose a customer $C$ buys something at time $t=0$, how long do you have to wait for them to buy again? This amounts to looking at the distribution of time differences, call them $T$, between successive buy times and plotting a histogram of waiting times between buys and their frequency. This gives you a rough distribution for how long you need to wait. For example if you want to figure out the chance that customer $C$ will buy a product in the next 20 days, this amounts to calculating $P(T\leq 20)$ which is just the sum of the number of buys with $T\leq 20$ divided by the total number of buys.

This method has one disadvantage: it essentially assumes that the waiting time is independent of past history. Here's a silly example: suppose a customer only buys stuff on Saturday and Sunday. You see waiting times of 1 day and 6 days. So it's a bit silly to assume a random process here. However, it's very simple to implement, and if it gives you the results you want, then great.

If you want to read more on this, look at Poisson Processes.

On the other hand, if you're really looking for strictly periodic patterns, then you should consult Autocorrelation processes.