Suppose that I have a sequence of discrete distributions: $$ p_j = (p_{1j},...,p_{Cj}), \: j=1...D,\\ p_{ij}>0 \:\: \forall i,j,\: \sum_{k=1}^Cp_{kj}=1\:\:\forall j. $$ I suppose that these distributions are not independent, but are the state probabilities for the variables that form a Markov chain: $$ P(\xi_j = i)=p_{ij}. $$ Then these distributions are linked with each other via the transition matrix M: $$ p_{j+1} = Mp_j \: \forall j=1...D-1. $$ I want some estimate on the transition matrix M. The reason for this is that I solve an ill-posed optimization task for $p_{ij}$. The values are formally independent, but from the infinite set of solutions I want to find the one that will have some temporal structure and not just chaotic transitions between the states. I want to add a regularizer that penalises the chaotic transitions - for example, sparsity maximization for the matrix M (that is, $\sum_{j=1}^{D} \text{KL}(M_{.,j}||\text{Uniform(1...C)})\to\max$).
I would be very grateful for ideas on some easy-looking expression for the M matrix estimate or an alternative idea for a regularizer.