Matrix derivation definition

137 Views Asked by At

I am tring to compute a mathematical derivation, but I am obviously missing something.

I precise that I have only learned "formal" definition of derivation in the 1D case, and am not familiar with Banach spaces or this degree of formalism.

I want to calculate $\frac{\partial b^T A b}{\partial b}$ where A is $(k,k)$ and b is $(k,1)$.

This is what I have tried \begin{align*} f(b+h) &= (b+h)^T A (b+h)\\ &= b^T A b + h^T A b + b^T A h + h^T A h \\ &= f(b) + (b^T A^T h )^T + b^T A h + o(|| h ||) \\ &= f(b) + b^T A^T h + b^T A h + o(|| h ||) \text{ 1D : I take the transpose} \\ &= f(b) + (b^T A^T + b^T A) h + o(|| h||) \\ &= f(b) + b^T (A^T + A) h + o(|| h ||) \\ f(b+h)-f(b) &= b^T (A^T + A) h + o(|| h ||) \\ \end{align*} I will use an improper (division by $h$), by analogy to the 1D case \begin{align*} \frac{f(b+h)-f(b) }{h} &= b^T (A^T + A) + o(|| 1 ||) \\ \frac{\partial f(b)}{\partial b} &= b^T (A^T + A) \end{align*}

But I have found in several courses (for instance Here and Here) that I am supposed to find $(A+A^T)b$.

So I guess my idea is not "too bad" but I am missing a conceptual element (surely linked to my improper derivation ?) because I obtain the wrong dimension.

So

  • Can someone correct me ?
  • Can someone explain to me the intuition behind why my dimension is not the right one?

Thanks

2

There are 2 best solutions below

0
On BEST ANSWER

Let's agree that vectors are written as columns.

Since $F(b)=b^TAb$ is a function from $\mathbb R^k$ to $\mathbb R$, the derivative (if it exists) will be a linear transformation $T\colon\mathbb R^k\to L(\mathbb R^k,\mathbb R)$ (for each $b\in\mathbb R^k$ the image $T(b)$ is a linear transformation from $\mathbb R^k$ to $\mathbb R$). The map $T$ is defined by requiring that $$ \lim_{h\to0}\frac{|F(b+h)-F(b)-T(a)h|}{\|h\|}=0 $$ (and if it exists it is unique). So, in your case we get $$T(b)h=b^T(A^T+A)h$$ and we usually write simply $$T(b)=b^T(A^T+A),$$but we still should interpret it as a linear transformation. If we write $T(b)=(A^T+A)b$ we would be thinking of the vectors as rows and so we would write $T(b)h=(A^T+A)b\cdot h$.

2
On

Consider the case where $b$ is a matrix.

Use the Frobenius (:) inner product to write the scalar-valued function as

$$\eqalign { f &= b:Ab \cr }$$ and take its differential

$$\eqalign { df &= db:Ab + b:A\,db \cr &= Ab:db + A^Tb:db \cr &= (A+A^T)\,b:db \cr\cr }$$ Since $df = \Big(\frac{\partial f}{\partial b}:db\Big),\,$ the gradient equals $$\eqalign{ \frac{\partial f}{\partial b} &= (A+A^T)b \cr\cr }$$ This result is valid when $b$ is any $(k\times n)$ matrix.
In particular, it is valid when $(n=1)$, i.e. when $b$ is a vector.