Derivatives of least-squares cost function

625 Views Asked by At

I have $f(x) = \frac{1}{2}|| Ax - b ||^2 $ for $ x \in \mathbb{R}^n, b \in \mathbb{R}^m $ and $ A \in \mathbb{R}^{m \times n}$

I am trying to write this as a summation, calculate the 1. and 2. derivative and then transform the result (for both derivatives) back to algebra notation.

Rewritten as a summation: $f(x) = \frac{1}{2}|| \sum_{i=1}^{n} \sum_{j=1}^{n} a_{i j} xj - \sum_{i=1}^{n} b_j ||^2$

What I have trouble with is calculating the derivatives since I cannot find much on the subject.

  1. Derivative (not sure if that is correct): $\frac{\partial f(x)_i}{\partial x_k} = \frac{\partial}{\partial x_k}\frac{1}{2}||\sum_{i=j}^{n}(a_{i j} - b_i)||^2 = \frac{1}{2}||\sum_{i=j}^{n} a_{i j} \frac{\partial f(x)_j}{\partial x_k} + 0||^2$

$f(x) = \frac{1}{2}||A||^2$

I would really appreciate if someone could show me how to solve problems like this.

2

There are 2 best solutions below

3
On BEST ANSWER

The norm $\|\cdot\|$ is for vectors... You are using norms where it does not make sense. What you have is that

$$ f(x) =\frac 12 \|Ax -b\|^2 = \frac 12 \sum_{i=1}^m\left(\sum_{j=1}^n a_{ij} x_j - b_i\right)^2 $$

So, the derivatives are computed as

$$ \frac{\partial f}{\partial x_k} = \frac 12 \sum_{i=1}^m 2a_{ik}\left(\sum_{j=1}^n a_{ij}x_j - b_i\right)=\sum_{i=1}^m a_{ik}\left(\sum_{j=1}^n a_{ij}x_j - b_i\right) $$

5
On

We can use matrix notation to get the results.

$$ \|Ax - b\|^2 = (Ax - b)^T(Ax - b) = (x^TA^T - b^T)(Ax - b) = $$ $$ x^TA^TAx - x^TA^T b - b^T Ax + b^T b = x^TA^TAx - 2 x^TA^T b + b^T b $$

Not the derivatives: $$ \frac{\partial}{\partial x}b^T b = 0 $$

$$ \frac{\partial}{\partial x}2 x^T A^T b = 2 A^T b \> \frac{\partial}{\partial x}x^T = 2 A^T b $$

$$ \frac{\partial}{\partial x}x^TA^TAx = A^TA \> \frac{\partial}{\partial x}x^Tx = 2 A^TAx $$

Finally we get $$ \frac{1}{2}\frac{\partial}{\partial x}\|Ax - b\|^2 = A^TAx - A^T b $$

Here we use the property of the transposition operation $(AB)^T = B^TA^T$