Let $\mathcal{S}$ be a finite set, and suppose that $P=\{P_{i,j}:i,j\in \mathcal{S}\}$ is a stochastic matrix, i.e., \begin{align} P_{i,j} &\geq 0 \quad \text{for all }i,j\in\mathcal{S},\\ \sum\limits_{j\in\mathcal{S}}P_{i,j}&=1\quad \text{for all }i\in\mathcal{S}. \end{align}
Given an integer $d\geq 1$, let $P^d$ denote the $d$th power of the matrix $P$. Also, let $P^d_{i,j}$ denote the $(i,j)$th entry of the matrix $P^d$.
I am interested in solving the following optimisation problem.
\begin{align} &\hspace{3cm}\max\,\,\sum\limits_{i,j\in \mathcal{S}} P_{i,j}^d\,\,\log Q_{i,j}^d\\ &\text{subject to}\\ &\hspace{3cm} Q_{i,j}\geq 0\quad \text{for all }i,j\in\mathcal{S},\\ &\hspace{3cm} \sum\limits_{j\in\mathcal{S}} Q_{i,j}=1\quad \text{for all }i\in\mathcal{S}. \end{align}
In other words, I would like to determine the best stochastic matrix $Q$ that maximises the above expression. Note the presence of the $d$th power of the stochastic matrix $Q$ in the objective function.
I tried the usual Lagrange multiplier method in which I had to use the following formula for the partial derivative of $Q^d_{k,l}$ with respect to $Q_{i,j}$ (taken from the matrix cookbook):
\begin{align} \frac{\partial Q^d_{k,l}}{\partial Q_{i,j}}=\sum\limits_{r=0}^{d-1}Q^{r}_{k,i}\,\cdot \,Q^{d-1-r}_{j,l}. \end{align}
However I have had no success until now. The Lagrangian is a very complicated-looking expression in terms of the entries of $Q^d$.
When $d=1$, I know that the answer is $Q=P$. I am tempted to deduce that the same is true even for the case when $d>1$. But I am unable to show it mathematically.
Can anyone please provide some hints on how to solve the above optimisation problem?
I was able to obtain a solution to this problem by looking at it from the perspective of projections. Fix $i\in \mathcal{S}$, and note that \begin{align} \sum\limits_{j\in\mathcal{S}}P_{i,j}^d\,\,\log Q_{i,j}^d &= \sum\limits_{j\in\mathcal{S}}P_{i,j}^d\,\,\log \frac{Q_{i,j}^d}{P_{i,j}^d}+\sum\limits_{j\in\mathcal{S}}P_{i,j}^d\,\,\log P_{i,j}^d\nonumber\\ &=-D(P_{i,\cdot}^d\|Q_{i,\cdot}^d)-H(P_{i,\cdot}^d), \end{align} where for any two probability distributions $\mu$ and $\nu$ on $\mathcal{S}$, the term $D(\mu\|\nu)$ denotes the Kullback-Leibler (KL) divergence between $\mu$ and $\nu$, and $H(\cdot)$ denotes the Shannon entropy functional.
From the above equation, it is clear that maximising the expression \begin{align} \sum\limits_{i\in\mathcal{S}}\sum\limits_{j\in\mathcal{S}}P_{i,j}^d\,\,\log Q_{i,j}^d \end{align} is equivalent to minimising the KL divergence term $D(P_{i,\cdot}^d\|Q_{i,\cdot}^d)$ for each $i\in\mathcal{S}$. Since this KL divergence term is minimised by setting $Q_{i,j}^d=P_{i,j}^d$ for all $i,j\in\mathcal{S}$, we get that the minimiser is indeed $Q=P$.