Calculating $\nabla\bigg[ (z_i - c)^{T} \mathbf{A} (z_i - c) - 1 \bigg]$ using matrix calculus

45 Views Asked by At

I'm trying to calculate $\nabla\bigg[ (z_i - c)^{T} \mathbf{A} (z_i - c) - 1 \bigg]$ with respect to the following variables; matrix $\mathbf{A}$ and the $c$ vector. I tried calculating it and ended up with getting $$\frac{\partial}{\partial c}\bigg[ (z_i - c)^{T} \mathbf{A} (z_i - c) - 1 \bigg] = -2 \mathbf{A} (z_i - c)$$ and $$\frac{\partial}{\partial \mathbf{A}}\bigg[ (z_i - c)^{T} \mathbf{A} (z_i - c) - 1 \bigg] = (z_i - c)^{T} \frac{\partial \mathbf{A}}{\partial \mathbf{A}} (z_i - c)$$ where I assume $\frac{\partial \mathbf{A}}{\partial \mathbf{A}}$ is a matrix of ones with the same dimensions and $\mathbf{A}$.

The problem now occurs when I want to optimize this with steepest descent and backwards line search using numpy. For some reason I am getting a scalar from $\frac{\partial}{\partial \mathbf{A}}\bigg[ (z_i - c)^{T} \mathbf{A} (z_i - c) - 1 \bigg]$ but I am not quite sure how I am supposed to apply this to my problem, as every matrix element of $\nabla \mathbf{A}$ now is equal. I'm just wondering if I have some of the matrix calculus wrong here? Or if this is a programming mistake. I'm currently using np.dot(x, np.dot(A_diff, x))

I've tested the optimization algorithm and it works for $c$.

1

There are 1 best solutions below

2
On BEST ANSWER

You have an error in your derivative calculation. It actually holds that

$$ \nabla_{A} \left[ (z_i - c)^\top A (z_i - c) - 1 \right] = (z_i - c) (z_i - c)^\top, $$

which has the same dimensions as $A$. In NumPy this is just np.outer(z - c, z - c), assuming that z, c are the vectors $z_i, c$ respectively.

Edit: To see why this is true, the most straightforward way is to write

$$ x^\top A x = \mathrm{tr}(x^\top A x) = \mathrm{tr}(xx^\top A) = \langle xx^\top, A \rangle \Rightarrow \frac{\partial x^\top A x}{\partial A} = xx^\top, $$ where the above steps use the cyclic invariance of the trace and the fact that $ \mathrm{tr}(A^\top B) = \langle A, B \rangle$, the inner product in $\mathbb{R}^{n \times n}$. You can also verify the above expression here, or check against the table in the relevant Wikipedia entry.

Yet another way to verify the above is to write the quadratic form above as

$$ x^\top A x = \sum_{i=1, j=1}^n x_{i} x_j A_{ij} \Rightarrow \frac{\partial x^\top A x}{\partial A_{ij}} = x_i x_j, $$

so when you actually put this together in a matrix it gives you precisely the outer product $x x^\top$, since $(xx^\top)_{i,j} = x_i x_j$