Joint optimization of precision matrices for common sparsity pattern

32 Views Asked by At

This question is motivated from paper by Cai, 2016 on joint estimation of multiple (K) precision matrices from K datasets.

Let $X^{(k)} \sim N(\mu^{(k)}, \Sigma^{(k)})$ be a p-dimensional random vector for the kth group. The precision matrix of $X^{(k)}$, denoted by $\Omega^{(k)} = (\omega_{ij}^{(k)})$ is the inverse of the covariance matrix $\Sigma_k$. Assume that $X^{(k)}$s are independent of each other. Suppose there are $n_k$ identically and independently distributed random samples from $X^{(k)}:\{X_j^{(k)}, 1 \leq j \leq n_k\}$ and $n = n_1 + \cdots + n_k$. The sample covariance matrix for each group is denoted by $\hat{\Sigma}^{(k)}$.

The goal is to simultaneously estimate the precision matrices $\Omega^{(k)}$ for $1\leq k \leq K$. The following optimization problem is proposed:

optimization

where $w_k = n_k/n$ is the weight for the kth group and $\lambda_n$ is a tuning parameter.

It is stated that the "objective function is used to encourage the sparsity of all K precision matrices", which makes sense to me since l1 penalty is imposed on each of the K precision matrices and this penalty drives entries to zero. However, the intuition behind the next statement is what I do not understand: "The constraint is imposed on the maximum of the element-wise group l2 norm to encourage the groups to share a common sparsity pattern". In particular, how does this constraint encourage a common sparsity pattern? What is the role of taking the max over (i,j) entries?