Let $\mathbf{A} \in \mathbb{R}^{M \times N}$, $\mathbf{x} \in \mathbb{R}^{N}$, $\mathbf{b} \in \mathbb{R}^{M}$, $\mathbf{c} \in \mathbb{R}^{N}$, $\mathbf{p} \in {\mathbb{R}^+}^{M}$ and $\mathbf{q} \in {\mathbb{R}^+}^{N}$. Given fixed $\mathbf{A}$, $\mathbf{b}$, $\mathbf{c}$, $\mathbf{p}$ and $\mathbf{q}$, how to solve the following least-squares optimization problem? Does it have a closed-form solution? $$ \mathbf{x}^* = \arg\min_{\mathbf{x}} \| \mathbf{A}\mathbf{x}-\mathbf{b} \|^2_\mathbf{p} + \| \mathbf{x}-\mathbf{c} \|^2_\mathbf{q}, $$ where $\|\mathbf{x}\|^2_\boldsymbol{\tau} \triangleq \sum_j (x_j^2 / \tau_j)$.
How to find the minimum of $f(\mathbf{x}) = \| \mathbf{A}\mathbf{x}-\mathbf{b} \|^2_\mathbf{p} + \| \mathbf{x}-\mathbf{c} \|^2_\mathbf{q}$
79 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Before we start deriving the solution, some facts and notations for brevity:
- Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
- Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}
For convenience, define two diagonal matrices (assuming $p>0$ and $q>0$) $$D_p:= \frac{1}{p} I$$ and $$D_q:= \frac{1}{q} I.$$
Remark: the weighted norm definition is bit unusual in OP's question.
Let \begin{align} f &= \| Ax - b\|_p^2 + \| x - c\|_q^2 \\ &= \left(Ax - b \right)^T D_p \left(Ax - b \right) + \left(x - c \right)^T D_q \left(x - c \right)\\ &\equiv \left(Ax-y\right): D_p\left(Ax-y\right) + \left(x-c\right): D_q\left(x-c\right) \end{align}
Now, we can obtain the differential first, and then the gradient. \begin{align} df &= Adx: D_p\left(Ax-y\right) + \left(Ax-y\right): D_pAdx + dx: D_q\left(x-c\right) + \left(x-c\right): D_qdx \\ &= 2A^T D_p\left(Ax-y\right):dx + 2\left(x-c\right):dx \end{align}
Thus, the gradient is \begin{align} \frac{\partial f}{\partial x} = 2A^T D_p\left(Ax-y\right) + 2\left(x-c\right). \end{align}
Now, set the gradient to zero then obtain solution. \begin{align} 2A^T D_p\left(Ax-y\right) + 2\left(x-c\right) = 0 \Longrightarrow x = \left(A^TD_pA + I \right)^{-1} \left( A^T D_p y + c \right). \end{align}
Welcome to MSE! Usually we encourage the OP to post whatever he/she/they have tried to solve the problem, so try to keep that in mind as you continue your (hopefully long-lasting and meaningful) journey on MSE.
Anyway, to your question. Note that $\|\mathbf{x}\|_\mathbf{\tau} = \|D_\tau \mathbf{x}\|$, where $\|\cdot\|$ denotes the standard Euclidean norm in the appropriate dimension and $D_{\tau} = \text{diag}\left(\tau_1^{-1/2}, \tau_2^{-1/2}, \cdots, \tau_n^{-1/2}\right)$. So it suffices to find $$\mathbf{x}^* = \arg \min_{\mathbf{x} \in \mathbb{R}^n} \left[\|A \mathbf{x} - \mathbf{b}\|^2 + \|D\mathbf{x} - \mathbf{c}\|^2 \right] ~~~~~~~~~~~~~~~ (*)$$ for arbitrary constants $A$, $D$, $\mathbf{b}$, and $\mathbf{c}$. (To solve your problem, just take $A$ in $(*)$ to be $D_{\mathbf{p}} A$, $\mathbf{b}$ in $(*)$ to be $D_{\mathbf{p}} \mathbf{b}$, take $D$ in $(*)$ to be $D_\mathbf{q}$, and take $\mathbf{c}$ in $(*)$ to be $D_{\mathbf{q}} \mathbf{c}$.) We can rewrite the objective function as $$\mathbf{x}^T A^T A \mathbf{x} - 2 \mathbf{b}^T A \mathbf{x} + \mathbf{b}^T \mathbf{b} + \mathbf{x}^T D^T D \mathbf{x} - 2 \mathbf{c}^T D \mathbf{x} + \mathbf{c}^T \mathbf{c} \\ = \mathbf{x}^T (A^T A + D^T D) \mathbf{x} - 2 (\mathbf{b}^T A + c^T D)\mathbf{x} + \mathbf{b}^T \mathbf{b} + \mathbf{c}^T \mathbf{c}$$ Since the last two terms are constants, the problem is equivalent to minimizing $$\mathbf{x}^T (A^T A + D^T D) \mathbf{x} - 2(\mathbf{b}^T A + \mathbf{c}^TD) \mathbf{x}$$ Note that $A^T A + D^T D$ is a real symmetric positive definite matrix and can hence be orthogonally diagonalized (see Spectral Theorem), say as $Q^T E Q$, with $Q$ an orthogonal matrix and $E$ a positive real diagonal matrix. We can then make a change of variables $\mathbf{y} \equiv Q \mathbf{x}$ to get the new objective function $$\mathbf{y}^T E \mathbf{y} - 2 \mathbf{d}^T \mathbf{y} = (\mathbf{y}^T - 2\mathbf{d}^T E^{-1}) E \mathbf{y}$$ where $\mathbf{d}^T = (b^T A + c^T D) Q^{-1} = (b^T A + c^T D) Q^T$. Since the change of variables from $\mathbf{x}$ to $\mathbf{y}$ was a bijection from $\mathbb{R}^n \to \mathbb{R}^n$, to solve the original problem we can simply find $\mathbf{y} \in \mathbb{R}^n$ maximizing this modified objective function. But doing so is easy, because if $$\mathbf{y} = \begin{bmatrix} y_1 & \cdots & y_n \end{bmatrix}^T, ~~ 2 \mathbf{d}^T E^{-1} = \begin{bmatrix} f_1 & \cdots & f_n \end{bmatrix}, ~~ E = \text{diag}(e_1, \cdots, e_n)$$ then the objective function is simply $$\sum_{i = 1}^n e_i (y_i^2 - y_i f_i)$$ whose minimum (recall all the $e_i$ are positive) occurs whenever each $y_i = f_i / 2$ (we just minimize each individual quadratic term). To recover the answer to $(*)$, we simply remember that $\mathbf{x} = Q^{-1} \mathbf{y} = Q^{T} \mathbf{y}$ to obtain the final answer of $$\mathbf{x} = Q^T (\mathbf{d}^T E^{-1})^{T} = Q^T [E^{-1} Q (A^T \mathbf{b} + D^T \mathbf{c})] = Q^T E^{-1} Q (A^T \mathbf{b} + D^T \mathbf{c}) \\ = (A^T A + D^T D)^{-1} (A^T \mathbf{b} + D^T \mathbf{c})$$ You can check that this agrees with the one variable least-squares solution by taking $D = 0$ and $\mathbf{c} = \mathbf{0}$. $\square$
Hence, the answer to the original question is \begin{align*} \mathbf{x}^* &= \left[(D_\mathbf{p} A)^T D_\mathbf{p} A + D_\mathbf{q}^T D_\mathbf{q} \right]^{-1} \left[ (D_\mathbf{p} A)^T D_\mathbf{p} \mathbf{b} + D_\mathbf{q} D_\mathbf{q} \mathbf{c}\right] \\ &= \boxed{(A^T D_\mathbf{p}^2 A + D_\mathbf{q}^2)^{-1} (A^T D_\mathbf{p}^2 \mathbf{b} + D_\mathbf{q}^2 \mathbf{c})} \end{align*}