Differentiation of inner product with matrices

388 Views Asked by At

Let $n \in \mathbb{N} (n \neq 0)$ , $A$ a real $\mathbb{nxn}$ square matrix, and $\mathbf{c}$ a vector in $\mathbb{R}^{n}$. Consider a real function $h: \mathbb{R} \longrightarrow \mathbb{R}, h \in C^{2}(\mathbb{R})$, and introduce the function $g: \mathbb{R}^{n} \longrightarrow \mathbb{R}$, defined by $$ g(\mathbf{x})=h\left(\langle A x, A x\rangle\right)- \langle \mathbf{c}, \mathbf{x} \rangle, \quad \forall \mathbf{x} \in \mathbb{R}^{n} . $$ I want to compute $\nabla g$ and $H(g)$ (using only matrices and vector terms)

My partial attempt: $$ g^{\prime}(x)=h^{\prime}(\langle A x, A x\rangle) \cdot \text { term }-c $$

Now $ \langle A x, A x\rangle $ is a sum of terms of the form:

$$ \left(\sum_{i=1}^{n} a_{s i} x_{i}\right)\left(\sum_{i=1}^{n} a_{s i} x_{i}\right)=\sum_{k_{1}+k_{2}, \ldots+k_{n}=2}^{n}\left(\begin{array}{c} 2 \\ k_{1}, k_{2}, \ldots, k_{n} \end{array}\right) x^{k_{1}} \cdot \ldots \cdot x^{k_{n}} $$

How to continue? is there any option to avoid using so detailed form?

Thank you

3

There are 3 best solutions below

0
On BEST ANSWER

The crux here is how to differentiate $f(x)=\langle Ax,Ax\rangle=\|Ax\|^2$. To do this, we just expand $$f(x+\eta)=\|A(x+\eta)\|^2=\|Ax\|^2+2\langle A\eta,Ax\rangle+\|A\eta\|^2=f(x)+2\eta^\top(A^\top Ax)+o(\|\eta\|).$$ So by the definition of the derivative, we have $\nabla f=2A^\top Ax$.

0
On

It is considerably simpler if you stick to vector/matrix notations.

Let $z=\mathbf{Ax}:\mathbf{Ax}$. The inner product is denoted with the colon operator here. The differential writes $$ dg = h'(z) dz - \mathbf{c}:d\mathbf{x} $$ Moreover one can show that $$ dz = 2 \mathbf{A}^T \mathbf{Ax}:d\mathbf{x} $$ So the gradient is

$$ \frac{\partial g}{\partial \mathbf{x}} = 2 h'(z) \mathbf{A}^T \mathbf{A} \mathbf{x} -\mathbf{c} $$

0
On

I don't see any $f$ defined, so I assume you mean $\nabla g$. We need to find $\Delta g$ in terms of $\Delta \mathbf x$.

$g(\mathbf x +\Delta \mathbf x)=$
$h\left(\langle A (\mathbf x +\Delta \mathbf x), A (\mathbf x +\Delta \mathbf x )\rangle\right)- \langle \mathbf{c}, \mathbf{x} +\Delta \mathbf{x} \rangle=$
$h\left(\langle A \mathbf x +A\Delta \mathbf x, A \mathbf x +A\Delta \mathbf x \rangle\right)- \langle \mathbf{c}, \mathbf{x} +\Delta \mathbf{x} \rangle=$
$h\left( \langle A \mathbf x, A \mathbf x \rangle + \langle A \mathbf x , A\Delta \mathbf x \rangle + \langle A\Delta \mathbf x, A \mathbf x \rangle + \langle A\Delta \mathbf x, A\Delta \mathbf x \rangle \right)- \left( \langle \mathbf{c}, \mathbf{x}\rangle + \langle \mathbf{c}, \Delta \mathbf{x} \rangle \right)=$

$\langle A\Delta \mathbf x, A\Delta \mathbf x \rangle$ is second-order term, so in the limit, we should be able to ignore it.

We can use the fact that inner product is commutative to combine $\langle A \mathbf x , A\Delta \mathbf x \rangle + \langle A\Delta \mathbf x, A \mathbf x \rangle $ into $2\langle A \mathbf x , A\Delta \mathbf x \rangle$. By the definition of derivative, if $h'$ exists, then in the limit we have $h(u+\Delta u) = h'(u)\Delta u$, so that gives us $h\left( \langle A \mathbf x, A \mathbf x \rangle + \langle A \mathbf x , A\Delta \mathbf x \rangle \right) = h'\left( \langle A \mathbf x, A \mathbf x \rangle \right)\langle A \mathbf x , A\Delta \mathbf x \rangle $.

With the $ \langle \mathbf{c}, \mathbf{x}\rangle + \langle \mathbf{c}, \Delta \mathbf{x} \rangle $ part, since we're looking for $\Delta g$, we remove the $ \langle \mathbf{c}, \mathbf{x}\rangle$ term, as only $\langle \mathbf{c}, \Delta \mathbf{x} \rangle $ contributes to the change in $g$. So

$\Delta g = h'\left( \langle A \mathbf x, A \mathbf x \rangle \right)\langle A \mathbf x , A\Delta \mathbf x \rangle + \langle \mathbf{c}, \Delta \mathbf{x} \rangle $.