My problem consists of a stochastic state space model. The state spaces are modeled after customers whom enter the space in a particular state and then that discrete label changes over time. There is a log for the trace (trajectory) of the customer based upon the feasible options allowed for. Understanding the customer behavior is an important aspect as well as summarizing the activity in the long run given the initial observations. What are the basic assumptions which must be satisfied to adapt this data into a Markov Chain and predict/extrapolate based upon its behavior?
I have to predict next state for costumers payment behaviour given past states. This states are strings like "overdue", "paid", "1st month overdue", "2nd month overdue", "In a payment plan", "Cancelled", "Up to date" and others, adding up to 13 posible states. The transitions between states can be done in almost all cases with one step only. I was thinking of use Markov Chain in this problem but I'm not sure if Markov's property is satisfied so I can use the mathematical tool with no worries. After this question, naturally another popped up in my mind: Do people always check if Markov property is satisfied BEFORE they use it? Or they just use it and then check the results and if works then it is good enough?
The bottom line of all this is: - How to know when to use Markov Chains in a particular scenario; and- If Markov Chain doesn't apply in this case, which other mathematical tool could be useful and if there exist a python (or another language) library to use it.
Both things happen, in my opinion. You can also consider higher order Markov chains to allow more information to be used by the model.
If your system is such that the transition dynamics depends only the current state, then you should feel confident in using a Markov chain. Of course, if you are only able to view a small subset of the state (i.e. you are dealing with more like a partially observable Markov decision process), this can manifest itself in requiring higher order dynamics modelling. I suspect this may be true in data as complex as yours. By this, I mean that if you could see ALL the information you could possibly want about the state, then likely a Markov chain would work better than if you had less information.
One way to validate your chain would be to look at the log-likelihood of the chain on the data (e.g. see here).
If you want to check for the presence of the Markov property, this is non-trivial. See here for some ideas on how to do it. This page also has a bunch of ways to test it.
Often (at least in science) people know from the underlying process (or at least can do an approximation) that the Markov process should hold, given the information contained in the state. Otherwise, for computational feasibility reasons, one has no choice but to assume the Markov property.