How to solve this optimization problem where the constraint requires the solution to an exponential power of 2?

73 Views Asked by At

I want to solve the following optimization problem: \begin{equation} \begin{aligned} &\min_{\bf{w}} (y-\bf{w}^{T}\bf{x})^2 , \\ &\text{subject to} \quad \forall i, |w_i-w_{i+1}|=2^{r_i}. \end{aligned} \end{equation} here, $\bf{x}$ and $\bf{w}$ are both $N$-dimension column vectors, $y$ is a scalar and $r_i$ is a constant. The constraint means that the absolute value of the difference between any two adjacent elements in $\bf{w}$ must be an integer power of 2. For example, if $N=3$, then $\bf{w}$$=[w_1,w_2,w_3]^{T}$, and it should satisfy that $|w_1-w_2|=2^{r_1}, |w_2-w_3|=2^{r_2}$.

My opinion

Let's take $N=3$ again.

Case 1: $r_i$ is known.

In this case I think we can use SGD method to solve the following problem $$ \min_{\bf{w}} ||\bf{w}^{T}\bf{x}-y||^2_2 + \sum_{i=1}^{2}\lambda_i ||(|w_i-w_{i+1}|-2^{r_i})||^2_{2} $$ We can obtain an approximate solutions by adjusting the value of $\lambda_i$ using SGD method.

Case 2: $r_i$ is unknown.

In this case the problem is knotty, since we don't know the detailed distance between $w_i$ and $w_{i+1}$ and what we know only is that the distance satisfies that it it an integer power of 2. Can anyone help me?

2

There are 2 best solutions below

1
On BEST ANSWER

Since my last comment pushed this through to a full answer, let me post it as such with a fuller explanation.

Case 1: You know the differences between $|w_{i+1} - w_i|$. Define $d_i = w_{i+1} - w_i$, so the $|d_i|$ are known, but not their signs. Then for all $i > 1, w_i = w_1 +\sum_{k=1}^{i-1} d_k$. For convenience, let $D_i = w_i - w_1 = \sum_{k=1}^{i-1} d_k$, (including $D_1 = 0$). Now

$${\bf w^Tx} = \sum_i w_ix_i = \sum_i w_1x_i + \sum_i D_ix_i = w_1A + B$$ where $A = \sum_i x_i$ and $B = \sum_i D_ix_i$. We have two cases:

  • $A \ne 0$: Then we can choose to make all the $d_i$ positive, so $B$ is known, and set $w_1 = \dfrac {y-B}A$ to obtain ${\bf w^Tx}-y = A\dfrac {y-B}A + B - y = 0$ as the minimum.
  • $A = 0$: Then ${\bf w^Tx}-y = B - y$. Now $$B = \sum_{i=1}^N D_ix_i = 0x_1+\sum_{i=2}^N \sum_{k=1}^{i-1} d_kx_i = \sum_{k=1}^{N-1}d_k\sum_{i=k+1}^N x_i = \sum_{k=1}^{N-1}\pm B_k$$ where $B_k = |d_k|\left|\sum_{i=k+1}^N x_i\right|$ are all known values, and the only options are whether to use $+$ or $-$ for each. Sort the $B_k$ from largest to smallest. If $-y \le 0$, choose to add the largest $B_k$, otherwise subtract it. If the result is $\le 0$, choose to add the second largest $B_k$, otherwise subtract. Continue in this wat until all the signs are chosen. The square of the final result will be $\min_{\bf w}({\bf w^Tx} - y)^2$.

Case 2: The differences $|w_{i+1} - w_i|$ are required to be integral powers of $2$, but the specific powers are not specified. Again, it will depend on whether $A = \sum_i x_i$ is $0$.

  • $A \ne 0$: Choose $w_{i+1} = w_i + 1$ for all $i$ (i.e., the $r_i = 0$). Proceed as in Case 1.

  • $A = 0$: Let $A_k = \left|\sum_{i=k+1}^N x_i\right|$, so $${\bf w^Tx}-y = -y + \sum_{k=1}^{N-1}\pm2^{r_k}A_k$$ Again, order the $A_k$ from largest to smallest. For the largest $A_k$, choose to add $2^{r_k}A_k$ if $-y \le 0$, and subtract otherwise. Pick $r_k$ to make the result as close to $0$, above or below, as possible. Then, using the result instead of $-y$, do the same for the second largest $A_k$, and so on. The square of the final result will be $\min_{\bf w}({\bf w^Tx} - y)^2$.

1
On

Let $W_0$ the set of vectors $\vec w =(w_1,\cdots,w_n)$ such that $|w_k-w_{k+1}|=2^{r_k}$. Now if $\vec w_0\in W_0$ then $\vec w_0+\lambda \vec v\in W_0$ where $\lambda\in \mathbb{R}$ and $\vec v = (1,\cdots,1)$. Now choosing $\vec w = \vec w_0+\lambda \vec v$ we have

$$ \min_{\vec w}(y-\vec w\cdot\vec x)^2 = \min_{\lambda,\vec w_0}(y-\vec w_0\cdot x-\lambda\vec v\cdot\vec x)^2 $$

and the optimal $\lambda = \lambda^*$ is given by

$$ \lambda^* = \frac{(y-\vec w_0\cdot\vec x)\vec v\cdot\vec x}{(\vec v\cdot\vec x)^2} $$

so we have

$$ \min_{\vec w}(y-\vec w\cdot\vec x)^2 = \min_{\vec w_0}(y-\vec w_0\cdot x-\lambda^*\vec v\cdot\vec x)^2 $$

NOTE

Here $\vec v\cdot \vec x = \sum_{k=1}^n x_k$ and the problem reduces to select the best $\vec w_0$ amongst the $2^{n-1}$ possible changes of sign in $\vec w_0$.