Applying geometric calculus to a matrix expression

110 Views Asked by At

Let $f(\mathbf x) = \mathbf x^T A \mathbf x + \mathbf b^T\mathbf x + c$, where $A$ is a square matrix, $\mathbf x, \mathbf b$ are column matrices and $c$ is a constant. Then the gradient of this expression is $\nabla f(\mathbf x) = (A+A^T)\mathbf x + \mathbf b$. I see how to derive this by interpretting the gradient as the matrix operator $D$. I'm wondering if we could get the same answer if we used the geometric gradient defined by $\nabla = \sum_i \mathbf e^i \partial_i$. So I'd like to know how to use geometric calculus on this matrix expression. Is there a way to do this? Thanks.

2

There are 2 best solutions below

0
On

Looks like it: \begin{align} f(\mathbf{x}) &= \sum_{ij} x_{i}a_{ij}x_{j} + \sum_{i} b_{i}x_{i} + c \\ &= \sum_{i} a_{ii}x_{i}^2 + b_{i}x_{i} + \sum_{i\ne j} x_{i}a_{ij}x_{j} + c \end{align} Then \begin{align} \nabla f(\mathbf{x}) &= \sum_{k} \mathbf{e}^{k}\partial_{k}\left( \sum_{i} a_{ii}x_{i}^2 + b_{i}x_{i} + \sum_{i\ne j} x_{i}a_{ij}x_{j}\right) \\ &= \sum_{k} \mathbf{e}^{k}\left(2a_{kk}x_{k} + \sum_{j}a_{kj}x_{j} + \sum_{i}a_{ik}x_{i} + b_{k} \right) \\ &= \sum_{k} \mathbf{e}^{k}\left(((A+A^T)\mathbf{x})_{k} + b_{k} \right) \\ &= (A+A^T)\mathbf{x} + \mathbf{b} \end{align}

2
On

First, let's go away from explicitly talking about things as matrices and column/row vectors. What you have is basically

$$f(x) = x \cdot A(x) + b \cdot x + c$$

We can use a simple identity here: for any constant vector $a$, and $\nabla = \nabla_x$, we have

$$\nabla (a \cdot x) = a$$

Expanding the $x \cdot A(x)$ term by the product rule allows us to apply this identity:

$$\begin{align*}\nabla f &= \nabla (x \cdot A[x]) + \nabla (b \cdot x) + \nabla c \\ &= \dot \nabla (\dot x \cdot A[x]) + \dot \nabla (x \cdot A[\dot x]) + b + 0 \end{align*}$$

Here, we've used overdots to denote what is differentiated: $\dot \nabla$ differentiates only $\dot x$. This gives us some of the flexibility of index notation without actually breaking into components.

Now, use the relation $u \cdot M(v) = M^T(u) \cdot v$ for any linear operator $M$ and vectors $u,v$. This allows us to cast the $\dot \nabla(x \cdot A[\dot x])$ term into the more tractable $\dot \nabla (A^T[x] \cdot \dot x)$ instead, on which we can use the gradient identity established earlier. The result is then

$$\begin{align*}\nabla f &= \dot \nabla (\dot x \cdot A[x]) + \dot \nabla (A^T[x] \cdot \dot x) + b \\ &= A(x) + A^T(x) + b \\ &= (A + A^T)(x) + b\end{align*}$$

The keys to avoiding index notation are the gradient identity $\nabla (x \cdot a) = a$ and the overdot notation for dealing with the product rule terms. The transpose identity is also crucial.