I'm trying to figure out how to minimize the following function: $$ f(P^*) = -\sum_{i}\sum_{j}P_{ij}\log(\frac{[P^*(I-\nu P^*)^{-1}]_{ij}}{P_{ij}}) $$
where the sum is over terms with nonzero entries. $P$ is some given matrix that is a properly normalized transition matrix (for some Markov chain) and $\nu$ is a positive real number.
The constraints are that $P^*$ is also a normalized transition matrix, and that $[P^*(I-\nu P^*)^{-1}]_{ij}= 0$ implies $P_{ij} = 0$.
Both the function and the constraints seem unwieldy. How would one go about finding the optimal $P^*$?
The stochastic matrix $(P^*)$ can be constructed from an arbitrary matrix $(A)$ and the all-ones matrix $(J)$ as follows $$\eqalign{ A,J &\in {\mathbb R}^{n\times n} \qquad J_{ik} = {\tt1} \\ P^* &= \frac{A}{JA} \qquad \big({\rm Elementwise\,Division}\big) \\ v &= P^*u \qquad \big({\rm Stochastic\,Vectors}\big) \\ }$$ NB: Some authors define the stochastic property in terms of left multiplication, in which case the above formulas would change to $$\eqalign{ P^* &= \frac{A}{AJ}, \qquad v^T &= u^TP^* \\ }$$ For convenience, define the matrices $$\eqalign{ dP^* &= \frac{JA\odot dA-A\odot J\,dA}{JA\odot JA} \\ Q &= (I-\nu P^*)^{-1} \quad\implies\quad dQ = -\nu Q\,dP^*Q \\ R &= \frac{P}{P^*Q} \\ S &= \frac{\big(\nu Q^T{P^*}^T-I\big)\,RQ^T}{JA\odot JA} \\ }$$ Write the cost function and calculate its gradient with respect to the unconstrained $A$ matrix. $$\eqalign{ f &= -P:\log\left(\frac{P^*Q}{P}\right) \\ df &= -P:\left(\frac{dP^*Q+P^*dQ}{P^*Q}\right) \\ &= -R:\big(dP^*Q-\nu P^*Q\,dP^*Q\big) \\ &= \nu (P^*Q)^TRQ^T:dP^* - RQ^T:dP^* \\ &= S:\big(JA\odot dA-A\odot J\,dA\big) \\ &= (S\odot JA):dA - J(S\odot A):dA \\ &= \Big(S\odot JA - J(S\odot A)\Big):dA \\ \frac{\partial f}{\partial A} &= S\odot JA - J(S\odot A) \\ }$$ Setting the gradient to zero yields the equation $$\eqalign{ S\odot JA = J(S\odot A) }$$ which must be solved for $A$.
However, the matrix $S$ (as well as $R,Q,P^*$) is a function of $A$, so this equation is nonlinear. A closed-form algebraic solution is highly unlikely, so an iterative numerical solution is your best option.
Since the gradient is known, I'd suggest a gradient-descent method to solve for $A$ which minimizes $f$. Once you have that you can construct $P^*$.
In the preceding, the symbol $\odot$ denotes the elementwise/Hadamard product and a colon represents the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$
Update
The $A$ matrix used above is actually not unconstrained, but is itself subject to a non-negativity constraint.However, the derivation can be salvaged by introducing a truly unconstrained matrix $X$ and defining $A$ in terms of it. $$\eqalign{ A &= \exp(X) \\ dA &= A\odot dX \\ }$$ Then in the final expression for the differential, perform a change of variable from $A\to X$ $$\eqalign{ df &= \Big(S\odot JA - J(S\odot A)\Big):dA \\ &= \Big(S\odot JA - J(S\odot A)\Big):A\odot dX \\ &= A\odot\Big(S\odot JA - \big(J(S\odot A)\big)\Big):dX \\ \frac{\partial f}{\partial X} &= A\odot\Big(S\odot JA - \big(J(S\odot A)\big)\Big) \\ }$$ Use this gradient to solve for the optimal $X$ which minimizes $f$, then calculate the corresponding $A$, and ultimately $P^*$