I want to solve the following optimization problem: \begin{equation} \begin{aligned} &\min_{\bf{w}} (y-\bf{w}^{T}\bf{x})^2 , \\ &\text{subject to} \quad \forall i, |w_i-w_{i+1}|=2^{r_i}. \end{aligned} \end{equation} here, $\bf{x}$ and $\bf{w}$ are both $N$-dimension column vectors, $y$ is a scalar and $r_i$ is a constant. The constraint means that the absolute value of the difference between any two adjacent elements in $\bf{w}$ must be an integer power of 2. For example, if $N=3$, then $\bf{w}$$=[w_1,w_2,w_3]^{T}$, and it should satisfy that $|w_1-w_2|=2^{r_1}, |w_2-w_3|=2^{r_2}$.
My opinion
Let's take $N=3$ again.
Case 1: $r_i$ is known.
In this case I think we can use SGD method to solve the following problem $$ \min_{\bf{w}} ||\bf{w}^{T}\bf{x}-y||^2_2 + \sum_{i=1}^{2}\lambda_i ||(|w_i-w_{i+1}|-2^{r_i})||^2_{2} $$ We can obtain an approximate solutions by adjusting the value of $\lambda_i$ using SGD method.
Case 2: $r_i$ is unknown.
In this case the problem is knotty, since we don't know the detailed distance between $w_i$ and $w_{i+1}$ and what we know only is that the distance satisfies that it it an integer power of 2. Can anyone help me?
Since my last comment pushed this through to a full answer, let me post it as such with a fuller explanation.
Case 1: You know the differences between $|w_{i+1} - w_i|$. Define $d_i = w_{i+1} - w_i$, so the $|d_i|$ are known, but not their signs. Then for all $i > 1, w_i = w_1 +\sum_{k=1}^{i-1} d_k$. For convenience, let $D_i = w_i - w_1 = \sum_{k=1}^{i-1} d_k$, (including $D_1 = 0$). Now
$${\bf w^Tx} = \sum_i w_ix_i = \sum_i w_1x_i + \sum_i D_ix_i = w_1A + B$$ where $A = \sum_i x_i$ and $B = \sum_i D_ix_i$. We have two cases:
Case 2: The differences $|w_{i+1} - w_i|$ are required to be integral powers of $2$, but the specific powers are not specified. Again, it will depend on whether $A = \sum_i x_i$ is $0$.
$A \ne 0$: Choose $w_{i+1} = w_i + 1$ for all $i$ (i.e., the $r_i = 0$). Proceed as in Case 1.
$A = 0$: Let $A_k = \left|\sum_{i=k+1}^N x_i\right|$, so $${\bf w^Tx}-y = -y + \sum_{k=1}^{N-1}\pm2^{r_k}A_k$$ Again, order the $A_k$ from largest to smallest. For the largest $A_k$, choose to add $2^{r_k}A_k$ if $-y \le 0$, and subtract otherwise. Pick $r_k$ to make the result as close to $0$, above or below, as possible. Then, using the result instead of $-y$, do the same for the second largest $A_k$, and so on. The square of the final result will be $\min_{\bf w}({\bf w^Tx} - y)^2$.