Why does this vector derivation hold?

116 Views Asked by At

I have the following variables/matrices:

$$A \in \mathbb{R}^{m \times n} , \quad p \in \mathbb{R}^{n}, \quad \Sigma \in \mathbb{R}^{m \times m}, \quad w \in \mathbb{R}^{m}$$

where $\Sigma$ is a diagonal matrix. With these we define function $S(p)$ as $$S(p) = (w + Ap)^{T} \Sigma^{-1} (w + Ap)$$

Since we would like to find the minimum of $S(p)$ we compute the first derivation with respect to $p$, according to my master's solution this is $$\nabla S(p) = 2(Ap + w)^{T} \Sigma^{-1} A \overset{!}{=} 0$$

However I don't understand how they arrive at this solution, could somebody please explain the intermediary steps?

2

There are 2 best solutions below

0
On BEST ANSWER

Let's take a look at the derivative with respect to the first coordinate.

First we apply the product rule. Then we note that the expression is a scalar, so we can also write it as a transpose. The transpose of a scalar is trivially the same as the scalar.

\begin{aligned}\frac{\partial}{\partial x} S(p) &= \frac{\partial}{\partial x}\left( (w+Ap)^T \Sigma^{-1} (w+Ap) \right) \\ &= \left(\frac{\partial}{\partial x}(w+Ap)\right)^T \Sigma^{-1} (w+Ap) + (w+Ap)^T \Sigma^{-1} \frac{\partial}{\partial x}(w+Ap) \\ &= \left(\left(\frac{\partial}{\partial x}(w+Ap)\right)^T \Sigma^{-1} (w+Ap)\right)^T+ (w+Ap)^T \Sigma^{-1} \frac{\partial}{\partial x}(w+Ap) \\ &= (w+Ap)^T \Sigma^{-1} \frac{\partial}{\partial x}(w+Ap)+ (w+Ap)^T \Sigma^{-1} \frac{\partial}{\partial x}(w+Ap) \\ &= 2 (w+Ap)^T \Sigma^{-1} \frac{\partial}{\partial x}(w+Ap) \\ &= 2 (w+Ap)^T \Sigma^{-1} (A\hat x) \end{aligned}

More generally, we can write this as: $$\nabla_i S(p) = 2 (w+Ap)^T \Sigma^{-1} (A e_i)$$ Or: $$\nabla S(p) = 2 (w+Ap)^T \Sigma^{-1} (A I) = 2 (w+Ap)^T \Sigma^{-1} A$$

0
On

I think there is a typo "$\Sigma \in \mathbb{R}^{m \times n}$". It must be "$\Sigma \in \mathbb{R}^{m \times m}$" (since in the definition of $S(p)$ it's multiplied by the vectors of the same dimension from both sides). Use the following vector derivation rules ($\langle \cdot, \cdot \rangle$ is the dot product in $\mathbb{R}^n$):

  1. $\nabla_p \langle p, c \rangle = c$, where $c \in \mathbb{R}^n$ is a constant (w.r.t. $p$) vector.

  2. $\nabla_p \langle Ap, p \rangle = (A + A^T)p$, where $A \in \mathbb{R}^{n\times n}$ is a constant (w.r.t $p$) matrix.

  3. $\langle x, y \rangle = x^T y$.

So we can write

$S(p) = \langle \Sigma^{-1}(w + Ap), (w + Ap)\rangle = \langle \Sigma^{-1}w, w \rangle + \langle \Sigma^{-1}w, Ap \rangle + \langle \Sigma^{-1}Ap, w \rangle + \langle \Sigma^{-1}Ap, Ap \rangle = \{\Sigma \text{ is symmetric}\} = \langle \Sigma^{-1}w, w \rangle + 2 \langle A^T \Sigma^{-1}w, p\rangle + \langle A^T\Sigma^{-1}Ap, p\rangle.$

Hence (using the rules above):

$\nabla_p S(p) = 2A^T\Sigma^{-1}w + (A^T\Sigma^{-1}A + (A^T\Sigma^{-1}A)^T)p = 2A^T\Sigma^{-1}w + 2A^T\Sigma^{-1}Ap = 2A^T\Sigma^{-1}(w + Ap) = (2(Ap + w)^T\Sigma^{-1}A)^T = 0.$

Transposing both sides we get the equality from your master's solution.