What is the relation between $\operatorname{tr}(\sigma \log(\sigma))$ and its projection?

107 Views Asked by At

Let $\sigma \in \mathbb R^{n\times n}$ be a positive semi-definite matrix ($n$ is an even number), and let the projections \begin{equation*} V_1 = \begin{pmatrix} I_{\frac{n}{2},\frac{n}{2}} & 0_{\frac{n}{2},\frac{n}{2}} \\ \end{pmatrix}, \end{equation*} \begin{equation*} V_2 = \begin{pmatrix} 0_{\frac{n}{2},\frac{n}{2}} &I_{\frac{n}{2},\frac{n}{2}} \end{pmatrix} \end{equation*}

where $I_{n,n}$ and $0_{n,n}$ are the identity matrix and the zero matrix of size $n\times n$, respectively.

Now what is the relation between the two quantities:

  1. $\operatorname{tr}(\sigma \log(\sigma))$, and
  2. $\operatorname{tr}(V_1 \sigma V_1^T \log(V_1 \sigma V_1^T)) + \operatorname{tr}(V_2 \sigma V_2^T \log(V_2 \sigma V_2^T)) $

Is there any generalization for the relation bound when the projection involves more than $V_1$ and $V_2$?

The $\log$ here is the matrix logarithm.

2

There are 2 best solutions below

0
On

$\def\bb{\mathbb}$ Instead of the two partitions in your example, let's generalize to $m$ partitions.

Use the Kronecker product $(\otimes)$ to define matrix analogs of the standard base vectors $\,e_i\in{\bb R}^{m}$ $$\eqalign{ &E_1 &= e_1\otimes I_k,\quad &E_2 &= e_2\otimes I_k, \quad\ldots\quad &E_m &= e_m\otimes I_k \\ }$$ where $k=\left(\frac nm\right),\,$ assuming that $n$ is divisible by $m$.

Use the $\{E_i\}$ matrices to extract $k\times k$ blocks along the diagonal of $\sigma$ $$\sigma_1 = E_1^T\sigma E_1,\quad \sigma_2 = E_2^T\sigma E_2,\quad\ldots\quad \sigma_m = E_m^T\sigma E_m$$ This gives rise to the generalized functions $$\eqalign{ F_m = \sum_{i=1}^m\,{\rm Tr}\Big(\sigma_i\log(\sigma_i)\Big) \\ }$$ Using this notation, your first function is $F_{\tt1},\;$ your second is $F_{\tt2},\,$ etc.
and Rammus's answer has demonstrated that $$F_{\tt1}\ge F_{\tt2}\ge F_{\tt3}\ge \ldots$$


Another approach avoids the summation by using the all-ones matrix $J_k\in{\bb R}^{k\times k}$, the identity matrix $I_m\in{\bb R}^{m\times m}$ and the Kronecker/Hadamard $(\otimes/\odot)$ products to create a block-diagonal matrix $$\eqalign{ H_{m} &= I_m\otimes J_k \,\in\, {\bb R}^{n\times n} \\ B_{m} &= H_m\odot \sigma \\ }$$ Now the generalized functions can be written as $$F_m = {\rm Tr}\Big(B_m\;\log(B_m)\Big)$$ The advantage of this form is that it only requires a single evaluation of the log function, and the gradient is easy to calculate $$\frac{\partial F_m}{\partial \sigma} = I_n + H_m\odot\log(H_m\odot \sigma)$$

2
On

The matrix functional you are interested in can be expressed in terms of the Umegaki divergence $$ D(\rho \| \sigma ) = \frac{\mathrm{Tr}[\rho (\log(\rho) - \log(\sigma))]}{\mathrm{Tr}[\rho]}. $$ See Equation 4.61 for the full definition. We have $\mathrm{Tr}[\sigma \log \sigma] = \mathrm{Tr}[\sigma]D(\sigma \| I)$.

The Umegaki divergence satisfies something known as the data processing inequality, for each trace-preserving completely positive map $\mathcal{E}$, we have $$ D(\rho\|\sigma) \geq D(\mathcal{E}(\rho) \| \mathcal{E}(\sigma)). $$ Again I refer to the linked book for further details.

Now the thing to notice is that for any collection of linear operators $\{V_1, \dots, V_m\}$ such that $\sum_i V_i^* V_i = I$ the map $\rho \mapsto \sum_{i} V_i \rho V_i^*$ is completely positive and trace preserving. (I am using $^*$ to denote the adjoint / conjugate transpose or transpose when things are real) Indeed the two projections $V_1$ and $V_2$ form such a map. Let $\mathcal{F}$ be the map defined by the action $\sigma \mapsto V_1 \sigma V_1^T + V_2 \sigma V_2^T$ then we have

$$ \begin{aligned} \mathrm{Tr}[\sigma \log \sigma] &= \mathrm{Tr}[\sigma] D(\sigma \|I) \\ &\geq \mathrm{Tr}[\mathcal{F}(\sigma)] D(\mathcal{F}(\sigma) \| \mathcal{F}( I)) \\ &= \mathrm{Tr}[\mathcal{F}(\sigma)] D(\mathcal{F}(\sigma) \| I ) \\ &= \mathrm{Tr}[\mathcal{F}(\sigma) \log \mathcal{F}(\sigma)] \\ &= \mathrm{Tr}[V_1 \sigma V_1^T \log V_1 \sigma V_1^T] + \mathrm{Tr}[V_2 \sigma V_2^T \log V_2 \sigma V_2^T]. \end{aligned} $$ Given the scope of the data processing inequality there is definitely room for adding additional orthogonal projectors in if you increase the dimension of the space. For reference maps whose action is the sum of orthogonal projector sandwiches are sometimes referred to as pinching maps.