What is the gradient $\nabla F$ of $F(\vec{x}) = {\lVert A \vec{x} - b \rVert}_2^2$ with $A \in \mathbb{R}^{n \times d}$

77 Views Asked by At

What is the gradient $\nabla F$ of $F(\vec{x}) = {\lVert A \vec{x} - b \rVert}_2^2$ with $A \in \mathbb{R}^{n \times d}$

My work so far:

$\vec{x} \in \mathbb{R}^d$ and $b \in \mathbb{R}^n$.

\begin{align*} \nabla F &= \text{Vec} \left( \frac{\partial F}{\partial x_1}, \ldots, \frac{\partial F}{\partial x_d} \right) \\ \end{align*}

\begin{align*} \frac{\partial F}{\partial x_i} &= \end{align*}

From here, I'm stuck. How do I calculate that partial derivative?

3

There are 3 best solutions below

1
On BEST ANSWER

$\begin{align}D_u(F) (x)&=\lim_{t\to 0}\frac{F(x+tu)-F(x)}{t}\\&=\lim_{t\to 0}\frac{\|A(x+tu)-b\|^2-\|Ax-b\|^2}{t}\\&=\lim_{t\to 0}\frac{\color{red}{\|Ax-b\|^2}+t\langle Ax-b, Au\rangle+t\langle Au, Ax-b\rangle+t^2\|Au\|^2 -\color{red}{{\|Ax-b\|^2}}}{t}\\&=2\langle Ax-b,Au\rangle\\&=\langle 2A^t(Ax-b),u\rangle\\&=\langle\nabla{F},u\rangle\end{align}$

Notations:

$D_u(F) (x) $: the directional derivative of $F$ at $x$ along the direction of the unit vector $u$

$A^t$ : transpose of the matrix $A$

$\langle \cdot, \cdot\rangle$ : inner product


Note:

  1. $\|x\|^2=\langle x, x\rangle$

  2. Polynomials are differentiable.

  3. If $F$ is differentiable at $x$ then the directional derivative of $F$ at $x$ along the direction of any unit vector $u$ exists and satisfy $$D_u(F) (x) =\langle \nabla F, u\rangle$$

0
On

Just note that $$\|Ax - b\|_2^2 = \sum_{i= 1}^n\left(\sum_{j = 1}^d a_{ij}x_j - b_i\right)^2$$ Therefore, $$\partial_k \|Ax - b\|_2^2 = 2\sum_{i= 1}^n\left(a_{ik}\sum_{j = 1}^d a_{ij}x_j - b_i\right)$$ which can be rewritten as $$\nabla \|Ax - b\|_2^2 = 2 A^T(Ax - b).$$

0
On

$$F(\boldsymbol x)=|\mathbf A\boldsymbol x-\boldsymbol b|^2 \\ =(\mathbf A\boldsymbol x-\boldsymbol b)\cdot (\mathbf A\boldsymbol x-\boldsymbol b) \\ =(Ax-b)^i~(Ax-b)_i \\ =(A^{ij}x_j-b^i)(A_{ik}x^k-b_i) \\ =A^{ij}x_j A_{ik}x^k-b_iA^{ij}x_j-b^i A_{ik}x^k - b^ib_i \\ = A^{i}{}_jA_{ik}x^jx^k -2b^i x^j A_{ij}-b^ib_i$$

$$\implies \nabla_l F=\nabla_l\left(A^{i}{}_jA_{ik}x^jx^k -2b^i x^j A_{ij}-b^ib_i\right) \\ =A^i{}_j A_{ik}\left(x^k\color{blue}{\nabla_lx^j} +x^j\color{blue}{\nabla_lx^k}\right)-2b^iA_{ij}\color{blue}{\nabla_l x^j} -\color{red}{\nabla_l(b^ib_i)} \\ =A^i{}_j A_{ik}(x^k \color{blue}{\delta ^j_l}+x^j \color{blue}{\delta ^k_l})-2b^iA_{ij}\color{blue}{\delta ^j_l} \\ =A^i{}_l A_{ik}x^k+A^i{}_jA_{il }x^j -2b^iA_{il} \\ =A_{il}\left(A^i{}_kx^k +A^i{}_j x^j-2b^i\right) \\ =2A_{il}(A^i{}_jx^j-b^i)$$

$$\implies \boxed{\nabla F=2\mathbf A^\intercal (\mathbf A\boldsymbol x-\boldsymbol b)}$$