Matrix derivative $(Ax-b)^T(Ax-b)$

32.3k Views Asked by At

I am trying to find the minimum of $(Ax-b)^T(Ax-b)$ but I am not sure whether I am taking the derivative of this expression properly.

What I did is the following: \begin{align*} \frac{\delta}{\delta x_i}\left(\sum_i \sum_j (A_{ij}x_i-b_i)(A_{ij}x_j-b_j)\right)&= \sum_j A_{ij}(A_{ij}x_j-b_j) + \sum_i A_{ij}(A_{ij}x_j-b_i) \end{align*}

but I'm not quite sure if this is correct and what the derivate would then be. Any help is appreciated.

2

There are 2 best solutions below

6
On BEST ANSWER

Perhaps some help to compute the partial derivatives would be appreciated. It is always best to be explicit when one is a bit confused with heavy notation. Write $A = (a_{ij})$, $x = (x_1,\dots,x_n)^{\top}$ and $b = (b_1,\dots, b_m)^{\top}$, assuming $A$ is an $m \times n$ matrix. Then the $i^{\text{th}}$ component of $Ax-b$ is $$ (Ax-b)_i = \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i $$ so that $$ (Ax-b)^{\top} (Ax-b) = \sum_{i=1}^m (Ax-b)_i^2 = \sum_{i=1}^m \left( \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i \right)^2. $$ Suppose you want to compute the derivative with respect to $x_k$, $1 \le k \le n$ (I choose $k$ because choosing $i$ or $j$ would be confusing with the preceding subscripts used). Then $$ \frac{\partial}{\partial x_k} (Ax-b)^{\top} (Ax-b) = \sum_{i=1}^m \frac{\partial}{\partial x_k} \left( \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i \right)^2 = \sum_{i=1}^m 2 \left( \left( \sum_{j=1}^n a_{ij}x_j \right) - b_i \right) (a_{ik}). $$ In particular, we can let $A_k = (a_{1k},a_{2k},\dots,a_{mk})$ so that $$ \frac{\partial}{\partial x_k} (Ax-b)^{\top} (Ax-b) = 2 \langle A_k, Ax-b \rangle $$ where $\langle - , - \rangle$ denotes inner product. You can use convexity arguments to show that any critical point is a minimizer in this case ; although you can see that the minimizer will not always be unique, even when $m=n$ ; it suffices for $A$ to be singular for this to happen.

Hope that helps,

3
On

I assume you are trying to minimize $\langle A x -b, A x - b\rangle.$ This is always nonnegative, so if $A$ is nonsingular, then the minimum is $0.$ Otherwise, the $i$-th component of the gradient is

$$\langle A_i, Ax +b\rangle + \langle A x + b, A_i\rangle$$

Where $A_i$ is the $i$-th column of $A.$ What does this tell you?