Solving: $\underset{\alpha}{\text{min}} \; || \left( b - A(\alpha \circ x ) \right) ||_{2}^{2}$

75 Views Asked by At

I want to solve the following minimization problem:

$$ \underset{\alpha}{\text{min}} \; || \left( b - A(\alpha \circ x ) \right) ||_{2}^{2} $$

where $\alpha, x \in \mathbb{C}^{N}$ and $b \in \mathbb{C}^{M}$ and $A \in \mathbb{C}^{M \times N}$ and $\circ$ denotes the Hadamard product.

This can be done using gradient descent. Since the objective function is convex and analytic, we just need to find the $\alpha$ such that the gradient $\frac{d}{d \alpha} = 0$

To start, we rewrite the objective as:

$$ f(\alpha) = || \left( b - A(\alpha \circ x ) \right) ||_{2}^{2} = ( b - A(\alpha \circ x ))^{H}( b - A(\alpha \circ x )) = \epsilon^{H} \epsilon$$

Now we can compute the differential:

$$ f = \epsilon: \epsilon $$ $$ df = 2\epsilon : d\epsilon = 2 \epsilon : d(b - A(\alpha \circ x )) = -2\epsilon: A( x \circ d \alpha)$$ $$ df = -2 A^{H} \epsilon : x \circ d \alpha$$ $$ \boxed{df = -2 A^{H} (\epsilon \circ x) : d \alpha}$$


Now this is where I am confused. According to the first identification theorem for differentials, the gradient should be:

$$ \boxed{\frac{df}{d \alpha} = -2 (\epsilon \circ x)^{H} A} $$

And so the gradient descent update is:

$$ \boxed{\alpha_{k+1} = \alpha_{k} + 2 \mu (\epsilon \circ x)^{H} A} $$

However, when minimizing $|| b-Ax ||_{2}^{2}$ wrt $x$, the derivative should be:

$$ \frac{df}{d x} = 2 A^{T}(b - Ax)$$

So where did I go wrong? What should the gradient be?

1

There are 1 best solutions below

1
On BEST ANSWER

Define the matrices $$\eqalign{X &= {\rm Diag}(x),\quad Y &= AX \\}$$ Reformulate the objective function to get rid of the Hadamard product (and the minus sign). $$\eqalign{ \phi &= (Y\alpha-b)^*:(Y\alpha-b) \\ d\phi &= (Y\alpha-b)^*:Y\,d\alpha \,\,\;+\; (Y\alpha-b):Y^*\,d\alpha^* \\ &= Y^T(Y\alpha-b)^*:d\alpha \;+\; Y^H(Y\alpha-b):d\alpha^* \\ \frac{\partial\phi}{\partial\alpha} &= Y^T(Y\alpha-b)^*,\quad \frac{\partial\phi}{\partial\alpha^*} = Y^H(Y\alpha-b) \\ }$$ Set the gradient (it doesn't matter which one) to zero and solve for the optimal value. $$\eqalign{ Y^HY\alpha &= Y^Hb \\ \alpha &= (Y^HY)^{-1}Y^Hb \\&= Y^+b \\ }$$ Notes:
$\quad(b-Y\alpha)=\epsilon\;$ in your terminology
$\quad Y^+\;$ is the Moore-Penrose inverse

Or you can avoid the gradient calculations completely by noting that you want the least-squares solution of a linear system, i.e.
$$Y\alpha = b \quad\implies \alpha = Y^+b $$