How to know if it is best to use Markov Chains to model a customer state space

423 Views Asked by At

My problem consists of a stochastic state space model. The state spaces are modeled after customers whom enter the space in a particular state and then that discrete label changes over time. There is a log for the trace (trajectory) of the customer based upon the feasible options allowed for. Understanding the customer behavior is an important aspect as well as summarizing the activity in the long run given the initial observations. What are the basic assumptions which must be satisfied to adapt this data into a Markov Chain and predict/extrapolate based upon its behavior?

I have to predict next state for costumers payment behaviour given past states. This states are strings like "overdue", "paid", "1st month overdue", "2nd month overdue", "In a payment plan", "Cancelled", "Up to date" and others, adding up to 13 posible states. The transitions between states can be done in almost all cases with one step only. I was thinking of use Markov Chain in this problem but I'm not sure if Markov's property is satisfied so I can use the mathematical tool with no worries. After this question, naturally another popped up in my mind: Do people always check if Markov property is satisfied BEFORE they use it? Or they just use it and then check the results and if works then it is good enough?

The bottom line of all this is: - How to know when to use Markov Chains in a particular scenario; and- If Markov Chain doesn't apply in this case, which other mathematical tool could be useful and if there exist a python (or another language) library to use it.

2

There are 2 best solutions below

0
On

Both things happen, in my opinion. You can also consider higher order Markov chains to allow more information to be used by the model.

If your system is such that the transition dynamics depends only the current state, then you should feel confident in using a Markov chain. Of course, if you are only able to view a small subset of the state (i.e. you are dealing with more like a partially observable Markov decision process), this can manifest itself in requiring higher order dynamics modelling. I suspect this may be true in data as complex as yours. By this, I mean that if you could see ALL the information you could possibly want about the state, then likely a Markov chain would work better than if you had less information.

One way to validate your chain would be to look at the log-likelihood of the chain on the data (e.g. see here).

If you want to check for the presence of the Markov property, this is non-trivial. See here for some ideas on how to do it. This page also has a bunch of ways to test it.

Often (at least in science) people know from the underlying process (or at least can do an approximation) that the Markov process should hold, given the information contained in the state. Otherwise, for computational feasibility reasons, one has no choice but to assume the Markov property.

0
On

The Markov property $\Pr(X_{n+1}=x\mid X_n=y) = \Pr(X_n=x\mid X_{n-1}=y)$, or $P(X_n|X_{n-1}) = P(X_n|X_{n-1},X_{n-2}...) $, and this simply says that the state you are interested in finding the probability for is dependent only upon the previous state. There is a transition matrix $T_{ij}$ or $P_{ij}$ gives you the $n^2$ probability mappings. The rows of this matrix must all sum to 1, which means that the set of total probabilities must be 1.0 for all the options to give you a probability mass distribution. In terms of the customers it means that the Markov property defines their movement as dependent upon their current plan/option, before they move into a new state with a probability. eg $P('cancelled'|'overdue')$ which is estimated simply by counting the number of overdue customers in your database and then looking to see what proportion of those are canceling. Then the Markov Chain simulation would be a random initialization of the customers and using the transition matrix simulate a set of state changes. You can then take the average final states when there have been many iterations because the average is stable (stationary).

Now unless there are cycles (loops) in your graph (transitions between states), it is unlikely that this will be very interesting as your problem is a tree structure. If that is the case, I would recommend decision treeswiki to decision trees. These are used frequently in business applications and present a lot of intuition about the data classifications, and category transition parameters. A very good introduction to them for python implementation decision trees python tutorial

*(you can make it higher order by fixing to a large past dependency but those approaches can consider the combination of past states as a joint label representation)