Finding the gradient and Hessian of $\frac{1}{2}|Ax-By|^2$

799 Views Asked by At

I'm trying to compute the gradient and Hessian of the following function

$$f(x,y) = \frac{1}{2}|Ax-By|^2$$

where $A$ and $B$ are $m \times n$ matrices, $x, y \in \mathbb{R}^n$, and $f: \mathbb{R}^{2n} \to \mathbb{R}$.

I honestly don't have a clue on the best way to proceed. Usually, to find the gradient, I would rewrite the function in sums and derive from there - but the square and multiple vector arguments have me stumped. I am not looking for a solution but rather a hint on where to start.

Furthermore, am I right in thinking that $\nabla f(x,y)$ is a vector in $\mathbb{R}^{2n}$ consisting of the partial derivatives along $x$ and $y$, and $\nabla^2 f(x,y)$ to be a $2n \times 2n$ matrix?

Thank you in advance.

2

There are 2 best solutions below

6
On

An answer for the gradient.

Assimilating vectors and column vectors (as you do) :

$$f(x,y) := \frac{1}{2}\|Ax-By\|^2=\frac{1}{2}(Ax-By)^T(Ax-By)=$$

$$\frac{1}{2}(x^TA^T-y^TB^T)(Ax-By)$$ $$=\frac{1}{2}\left(x^T(A^TA)x-\underbrace{(x^TA^TBy+y^TB^TAx)}_{2x^TA^TBy}+y^T(B^TB)y\right)\tag{1}$$

Let us now apply 2 classical results :

1) the gradient of $x^TMx$ with respect to $x$ is $2x^TM$, seen as a row vector. Why that ? Consider the (Taylor) expansion, where $h$ is a vector increment:

$$\underbrace{(x+h)^TM(x+h)}_{f(x+h)}=\underbrace{x^TMx}_{f(x)}+\underbrace{x^TMh+h^TMx}_{(2x^TM)h=f'(x).h}+\underbrace{h^TMh}_{\text{2nd order term}}$$

2) The gradient of $x^TMy$ with respect to $y$ is row vector $x^TM$, for a similar reason.

Using these two results, the gradient of (1) is (indeed!) a $2n$ dimensional row vector which is:

$$(x^T(A^TA)-y^TB^TA,y^T(B^TB)-x^TA^TB)$$

Remarks :

1) Besides, yes, the Hessian is a $2n \times 2n$ matrix.

2) A different derivation for (1) could have been done by writing :

$$f(x,y) := \frac{1}{2}\|Ax-By\|^2= \frac{1}{2}\begin{pmatrix}x^T \ \ y^T\end{pmatrix}\begin{pmatrix}A^T\\-B^T\end{pmatrix}\begin{pmatrix}A \ \ -B\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}$$

$$= \frac{1}{2}\begin{pmatrix}x^T \ \ y^T\end{pmatrix}\begin{pmatrix}A^TA&-A^TB\\-B^TA&B^TB\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}$$

2
On

Concatenate the two vectors (and matrices) into a single column vector (and matrix). $$\eqalign{ z &= \pmatrix{x\\y} &\in{\mathbb R}^{2n\times 1} \quad\quad C &= \Big[\,A\;-\!\!B\,\Big] &\in{\mathbb R}^{m\times 2n} \\ }$$ Now the function is very simple, as are its derivatives. $$\eqalign{ f &= \tfrac{1}{2}\|Cz\|^2 = \tfrac{1}{2}z^TC^TCz \\ \frac{\partial f}{\partial z} &= C^TCz \\ \frac{\partial^2 f}{\partial z^2} &= C^TC \\ }$$