Derivative of matrix using index notation

1k Views Asked by At

In my stats textbook, they define the following function:

$\mathbf{f} = \frac{1}{2}(\mathbf{A}\mathbf{x} - \mathbf{b})^2$,

where $\mathbf{A}$ is a matrix, $\mathbf{x}, \mathbf{b}$ are just vectors. They then say that:

$\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \mathbf{A}^{T}(\mathbf{A}\mathbf{x} - \mathbf{b})$

I tried to do this derivative using index notion. So, I defined $f$ as:

$f = \frac{1}{2} (A_{ij}x^{j} - b_{i})^2$,

Then took the derivative with respect to $x^k$, (I use commas to denote partial derivatives):

$f_{,k} = \delta^{j}_{k} A_{ij} (A_{ij}x^{j} - b_{i})$

Which applying the contraction, I get:

$f_{,k} = A_{i}^{k} (A_{ij}x^{j} - b_{i})$

But, I do not know if $A_{i}^{k}$ represents $\mathbf{A}^T$?

2

There are 2 best solutions below

0
On

Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.

0
On

Some comments (I am not yet allowed add them as a comment):
a) your function $f=(Ax-b)^2$ is not defined if $A$ is a matrix. My guess is that it should be $f(x)=(Ax-b)^T(Ax-b)$.
b) The key of derivating a real valued function $f$ wrt a $K$-vector $x$ is:
b1) If $x$ is a column vector then $\partial f/\partial x$ is a column vector with $\partial f/\partial x_i$ as i-th element
b2) $\partial x/\partial x^T = \partial x^T/\partial x = I_K$
With these conventions derivation of $f$ wrt the vector $x$ yields the same result as element by element partial derivation.