Is there any method that can optimize the problem whose regularizer is kurtosis term?

38 Views Asked by At

I recently worked on an optimization problem, whose regularizer $g(x)$ is kurtosis. The overall optimization formula is as follows.

$$\begin{align} \arg \min_x \frac12 \Vert Ax-b\Vert_2^2 + \lambda g(x) \end{align} \ \ \ \ (1)$$

where $A \in \mathbb{R}^{m\times n}, \ x \in \mathbb{R}^{n\times 1}, \ b \in \mathbb{R}^{m\times 1}$. Further, $g(x) \leftarrow K_4$, and then

$$ K_4 = \frac{E_N (x)^4}{\Big(E_N(x)\Big)^2} $$

$$ E_N (x_i) = \frac 1N \sum_{i=1}^N x_i $$

$$ \Rightarrow K_4 = \frac{N\sum_{i=1}^{N}(x_i)^4}{\\ \Big(\sum_{i=1}^{N}(x_i)^2\Big)^2} = N \frac{\sum x^4}{\big(\sum x^2 \big)^2} $$

where $E(\cdot)$ is expected value of a variable. Then the optimization equation became

$$ \begin{align} \arg \min_x \frac12 \Vert Ax-b\Vert_2^2 + \lambda N \frac{\sum x^4}{\big(\sum x^2 \big)^2} \end{align} $$

Spark

Trying to find the solution of equation above, Dilip Krishnan's paper[1] showed his way, the equation tried to be solovd listed as

$$ \begin{align} \arg \min_x \frac12 \Vert Ax-b\Vert_2^2 + \lambda \frac{\Vert x\Vert_1}{\Vert x\Vert_2} \end{align} $$

and the author solve it with an iterative method. To be specific, fix the denominator of the regularization term ($\Vert x \Vert_2$ computed from the previous iteration and the ongoing stage regarding as a constant value) and the problem become the $\ell_1$-regularized optimization problem (could be solved with an existing ripe algorithms) .

More specifically, it would be

$$ x_k = \arg \min_{x_{k-1}} \frac12 \Vert Ax_{k-1}-b\Vert_2^2 + \lambda \frac{\Vert x_{k-1} \Vert_1}{C_{k-1}}$$

$$ = \arg \min_{x_{k-1}} \frac {C_{k-1}}{2} \Vert Ax_{k-1}-b\Vert_2^2 + \lambda \Vert x_{k-1}\Vert_1 $$

then update $C_k = \Vert x_{k} \Vert_2$. Then go to next iteration,

$$ \begin{align} x_{k+1} &= \arg \min_{x_{k}} \frac {C_{k}}{2} \Vert Ax_{k}-b\Vert_2^2 + \lambda \Vert x_{k}\Vert_1 \end{align} $$

keep this rhythm and got final solution $\hat x$.

Coming back to my problem, use iterative method in the mass and regard the denominator as a constant,

$$ x_k = \arg \min_{x_{k-1}} \frac12 \Vert Ax_{k-1}-b\Vert_2^2 + \lambda N \frac{\sum (x_{k-1})^4}{C_{k-1}} $$

$$ x_k = \arg \min_{x_{k-1}} \frac{C_{k-1}}{2} \Vert Ax_{k-1}-b\Vert_2^2 + \lambda N \sum (x_{k-1})^4 \ \ \ \ \ (2) $$

solve this equation with an algorithm, and get solution $x_k$. Do same thing as Dilip Krishnan's work, update $C_k = \big(\sum (x_k)^2 \big)^2$. Then go to next iteration,

$$ \begin{align} x_{k+1} &= \arg \min_{x_{k}} \frac{C_{k}}{2} \Vert Ax_{k}-b\Vert_2^2 + \lambda N \sum (x_{k})^4 \end{align} $$

until convergence (the error accepted or the error is tiny enough).

Question

My question is, the resolving thoughts is feasible or not? If yes, how can i solve eq.(2) effectively? If no, why and which method could be an alternative and doable one?

Thanks a lot!

Attemptation

Solving problem

$$ \begin{align} x_{k+1} &= \arg \min_{x_{k}} \frac{C_{k}}{2} \Vert Ax_{k}-b\Vert_2^2 + \lambda N \sum (x_{k})^4 \end{align} $$

let $f(x) = f_1(x) + f_2(x)$ and

$$ f_1(x) = \frac{C_{k}}{2} \Vert Ax-b\Vert_2^2 $$

$$ f_2(x) = \lambda N \sum (x_i)^4 = \lambda N \big( \mathbb{1}^T x^4\big), \ \ \mathrm{where} \ \mathbb{1} \in \mathbb{R}^{n\times1} $$

Take derivative w.r.t variable $x,$ respectively,

$$ \frac{d}{dx}f_1(x) = {C_{k}} \Big( A^TAx-A^Tb \Big)\in \mathbb{R}^{n \times 1} $$

$$ \frac{d}{dx}f_2(x) = 4\lambda N \big(\mathbb{1}^T x^3\big) \mathbb{1} \in \mathbb{R}^{n \times 1}$$

Furthermore,

$$ \frac{d}{dx}f(x) = 0 $$

$$ \Rightarrow C_kA^T Ax + 4\lambda N \big(\mathbb{1}^T x^3\big) \mathbb{1} = C_k A^T b $$

and what i can do next ?

Reference

[1] Krishnan, Dilip et al. “Blind deconvolution using a normalized sparsity measure.” CVPR 2011 (2011): 233-240.

1

There are 1 best solutions below

0
On BEST ANSWER

$ \def\a{\alpha} \def\b{\beta} \def\l{\lambda} \def\o{{\tt1}} \def\h{\odot} \def\bR#1{\big(#1\big)} \def\BR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_2} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\x#1{x^{\h#1}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracBR#1#2{\BR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $If $\h$ is the Hadamard product, then denote the Hadamard power of a vector as $$\eqalign{ \x3 = x\h x\h x \\ }$$ Given the cost function $$\eqalign{ \phi &= \tfrac12\frob{Ax-b}^2 \;+\; n\l\frac{\frob{\x2}^2}{\frob{x}^4} \\ }$$ The gradient is straightforward to calculate $$\eqalign{ g(x) \,=\, \grad{\phi}{x} \;=\; A^T\LR{Ax-b} \;+\; \frac{4n\l}{\frob{x}^6}\LR{\frob{x}^2\:\x3-\frob{\x2}^2\:x} \\ }$$ For an initial guess, set $\l=0$ to obtain the least-squares solution: $\;x_0=A^+b$

Then use Barzilai-Borwein (or some other gradient descent algorithm) to iterate for the optimal vector using the previously derived gradient expression $$\eqalign{ &g_k = g(x_k) \\ &x_{k+\o} = x_k - \a_k\,g_k \\ &k = k + \o \\ }$$ The linked article contains the formulas for the $\a_k$ step sizes.