Lagrangian dual and matrix constraints

841 Views Asked by At

I am starting to work with matrix calculus and I am trying to write the correct dual for the following minimization problem: $$\begin{equation*} \begin{aligned} & \underset{X}{\min} & & f_0(X) \\ & \text{subject to} & & X \preccurlyeq 1, \\ & & & \textbf{1}X=\textbf{1} \end{aligned} \end{equation*}$$

Where with the first inequality I want to say that all the elements of the matrix are less then equal to one (is it correct?) and the second one want to say that all the columns should sum to one (the first 1 is a row vector with as many elements as the rows of X and the second one is a row vector with as many elements as the columns of X). What I want to model here is that X is a matrix of conditional probabilities, where every column is a different conditioning event and the elements in that columns are the probabilities.

Then, I would like to write the dual for it but I am not sure about the dimensions of the lagrange multipliers: can it be written like this? $g(\lambda,\nu)= \inf_X(f_0(X)+\lambda(X-1)+\nu(\textbf{1}X-\textbf{1}))$. If yes, what should the dimensions of $\lambda$ and $\nu$ be? Or should it be a sum like this: $g(\lambda,\nu)= \inf_X(f_0(X)+\sum_{i,j}\lambda_{i,j}(X_{i,j}-1)+\sum_j\nu(1^T\textbf{x}_j-1))$? And then $\lambda$ is a matrix and $\nu$ a vector?

Or even, are they the same thing? Thanks and sorry for the banality of the question.

1

There are 1 best solutions below

2
On BEST ANSWER

The notation $X \preccurlyeq 1$ is not standard for this purpose. One could be confused and think that it means that $X-1$ has to be positive semidefinite (and there is ambiguity around subtracting a scalar from a matrix). I would write it as $X_{ij} \leq 1 \; \forall i,j$.

Neither of your duals is correct. The problem in the first one is that $\lambda(X-1)$ is not a scalar; in the second one $\nu(1^T\textbf{x}_j-1))$ is as vector. The right formulation is: $$g(\lambda,\nu)= \inf_X(f_0(X)+\sum_{i,j}\lambda_{i,j}(X_{i,j}-1)+\sum_j\nu_j(1^T\textbf{x}_j-1))$$ You can shorten it a bit with the trace operator: $$g(\lambda,\nu)= \inf_\textbf{X}(f_0(X)+\text{tr}(\lambda(\textbf{X}-\textbf{E}))+(\textbf{1}\textbf{X}-\textbf{1})\nu),$$ where $\textbf{E}$ is the matrix of which each element is 1.