I am new to semi-definite programming and I am trying to follow through the optimization described in
http://cvxopt.org/userguide/spsolvers.html#example-covariance-selection
The problem is to approximate an existing covariance matrix with a sparse covariance (positive semi-definite) matrix $K$ with a specified sparsity pattern $K_{ij}=0$. The objective is to minimize the KL-divergence between $K$ and $Y$.
$$ \begin{array}{ll} \mbox{minimize} & -\log\det K + \mathbf{tr}(KY) \\ \mbox{subject to} & K_{ij}=0,\quad (i,j) \not \in S.\end{array} $$
I understand that the $-\log\det K$ is the logarithmic barrier function that ensures positive semi-definiteness of K.
Next, the page proceeds by assuming that $K$ can be expressed as
$$ K(x) = E_1\mathbf{diag}(x)E_2^T+E_2\mathbf{diag}(x)E_1^T $$
where $x$ are the non-zero elements in the lower triangular part of $K$, with the diagonal elements scaled by 1/2, and
$$ E_1 = \left[ \begin{array}{cccc} e_{i_1} & e_{i_2} & \cdots & e_{i_q} \end{array}\right], \qquad E_2 = \left[ \begin{array}{cccc} e_{j_1} & e_{j_2} & \cdots & e_{j_q} \end{array}\right], $$
where (i_k, j_k) are the positions of the non-zero entries in the lower-triangular part of $K$. With this notation, the optimization problem reduces to the unconstrained optimization problem:
$$ \begin{array}{ll}\mbox{minimize} & f(x) = -\log\det K(x) + \mathbf{tr}(K(x)Y).\end{array} $$
Can someone provide more insight into the construction of $E_1$ and $E_2$?
Since $K$ is symmetric, we first deal with the lower triangular part. The multiplication using $E_1$ and $E_2^T$ positions the non-zero elements in their respective places. After this we just make the matrix symmetric by adding the other part. To see why it works let's take an example. Consider the matrix $M=$ \begin{bmatrix} p &q &0 \\ q &s &r\\ 0 &r &0 \end{bmatrix}
The positions of the non-zero elements I choose to are $p=(1,1),s=(2,2),q=(1,2),r=(3,2)$.
As you can see I have not chosen them to be in the lower triangular part. They can be anywhere. This is only a general example.
$E_1$=\begin{bmatrix} 1 &0 &1 &0 \\ 0 &1 &0 &0 \\ 0 &0 &0 &1 \\ \end{bmatrix}
$Diag(x)=$\begin{bmatrix} p &0 &0 &0 \\ 0 &s &0 &0 \\ 0 &0 &q &0 \\ 0 &0 &0 &r \\ \end{bmatrix}
$E_2^T$=\begin{bmatrix} 1 &0 &0 \\ 0 &1 &0 \\ 0 &1 &0 \\ 0 &1 &0 \\ \end{bmatrix}
If we now multiply $E_1 Diag(x) E_2^T$ we will get the following matrix
\begin{bmatrix} p &q &0 \\ 0 &s &0\\ 0 &r &0 \end{bmatrix}
This explains the construction of the matrices $E_1$ and $E_2$.