Solving Quadratic Matrix Programming (Trace of a Vector Multiplication) with Boundary Constraints

422 Views Asked by At

I intend to solve for vector $ x \in \mathbb{R}^{N \times 1} $ by solving the following optimization problem

\begin{align} \arg \min_{x} Tr( (\mathbf{K} \mathbf{W})^T \mathbf{P} ( \mathbf{K} \mathbf{W})) - 2Tr( \mathbf{P} \mathbf{K} \mathbf{W}) \end{align} \begin{align} \text{subject to } & x_{i}^{min} \leq x_{i} \leq x_{i}^{max} \\ \end{align} where $Tr()$ is the trace operator, $\mathbf{P} \in \mathbb{R}^{M \times M}$, $\mathbf{W}=\mathbf{A}diag(\mathbf{B} x)$, and $\mathbf{W} \in \mathbb{R}^{D \times M}$, and $\mathbf{K} \in \mathbb{R}^{M \times D}$.

$\mathbf{A} \in \mathbb{R}^{D \times M}$ and $\mathbf{B} \in \mathbb{R}^{M \times N}$ are both positive metrices.

How do I solve it as an inequality constrained optimization problem for $X$ ?

2

There are 2 best solutions below

3
On BEST ANSWER

The problem is given by:

$$\begin{aligned} \arg \min_{x} \quad & \operatorname{Tr} \left( {\left( K A \operatorname{diag} \left( B x \right) \right)}^{T} \left( K A \operatorname{diag} \left( B x \right) \right) \right) - 2 \operatorname{Tr} \left( P K A \operatorname{diag} \left( B x \right) \right) \\ \text{subject to} \quad & {a}_{i} \leq {x}_{i} \leq {b}_{i} \; \forall i \end{aligned}$$

The problem here is the $ \operatorname{diag} \left( \cdot \right) $ operator which makes it hard to infer the gradient.

Yet:

$$ \operatorname{diag} \left( B x \right) = I \circ \left( \boldsymbol{1} {\left( B x \right)}^{T} \right) = I \circ \left( \boldsymbol{1} {x}^{T} {B}^{T} \right) $$

Where $ \circ $ is the Hadamard Product.

Now you can plug it in and use some Matrix Calculus to find the gradient (It seems that using the Frobenius Norm will be useful).
Once you have the gradient you can solve it easily with Projected Gradient Descent Method.

Probably due to use of the Trace Operator you can get better equivalent forms of the problem that takes advantage of $ A $ and $ B $ being Positive definite matrices. As since they are solving for $ y = B x $ is like solving for $ x $.

Remark:

I think the question: Given $ y = A x $ where $ A $ is PD matrix and it is known that $ {a}_{i} \leq {x}_{i} \leq {b}_{i} $ what can be said on $ y $ (Namley how it is bounded) deserves its own question.

Update

Thinking of it, one could calculate the gradient of the Frobenius norm directly:

$$ \frac{\mathrm{d} }{\mathrm{d} x} \frac{1}{2} {\left\| A \operatorname{diag} \left( B x \right) \right\|}_{F}^{2} = {B}^{T} \operatorname{diag} \left( {A}^{T} A \operatorname{diag} \left( B x \right) \right) $$

3
On

The problem is given by:

$$\begin{aligned} \arg \min_{x} \quad & \operatorname{Tr} \left( {\left( K A \operatorname{diag} \left( B x \right) \right)}^{T} P \left( K A \operatorname{diag} \left( B x \right) \right) \right) - 2 \operatorname{Tr} \left( P K A \operatorname{diag} \left( B x \right) \right) \\ \text{subject to} \quad & {a}_{i} \leq {x}_{i} \leq {b}_{i} \; \forall i \end{aligned}$$ in order to solve this problrm with Projected gradient descent we need to take the derivative of the first and second term with respect to $x$. for the first term, we can take the advantage of matrix $P$ which is positive-definite matrix and can be wrritten as $P=C^{T}C$. the first term then can be treated as frobenius norm:

$$\begin{aligned} \operatorname{Tr} \left( {\left( K A \operatorname{diag} \left( B x \right) \right)}^{T} P \left( K A \operatorname{diag} \left( B x \right) \right) \right) = {\left\| K A \operatorname{diag} \left( B x \right) C \right\|}_{F}^{2} \end{aligned}$$

then the gradient of the first term can be calculated as

$$\begin{aligned} 2{B}^{T} \operatorname{diag} \left( {(K A)}^{T} (K A) \operatorname{diag} \left( B x \right) CC^{T}\right) \end{aligned}$$

if we consider the second term to be $T=- 2 \operatorname{Tr} \left( P K A \operatorname{diag} \left( B x \right) \right)$ we then have:

$dT=-2(PKA)^{T}:dX$ in which $dX=diag(Bx)$. using the properties of frobenius product we can write:

$dT=diag(-2(PKA)^{T}):di$ and $di=Bdx$ so:

$dT=B^{T}diag(-2A^{T}K^{T}(P)):dx$ and

$dT/dx=-2B^{T}diag(A^{T}K^{T}(P))$

overall the gradient of the above equation can be calculated as:

$$\begin{aligned} 2{B}^{T} \operatorname{diag} \left( {(K A)}^{T} (K A) \operatorname{diag} \left( B x \right) CC^{T}\right) -2B^{T}diag(A^{T}K^{T}P) \end{aligned}$$

I appreciate if you can check and find out whether I am in a right direction or not?