Derivative of a quadratic cost function with respect to a vector

1k Views Asked by At

I am new to matrix calculus and have a question regarding finding the derivative of the cost function defined below with respect to $\theta$, which is actually the exponential term of a multinomial distribution.

$$Q = (x-H\theta)^T C^{-1} (x-H\theta)\\ = (x^T -\theta^TH^T) C^{-1} (x-H\theta) \\ = x^T C^{-1} x - x^T C^{-1} H \theta - \theta^T H^T C^{-1} x + \theta^T H^T C^{-1} H \theta$$

How do I get the derivative of the 3rd term since $\theta$ is transposed. The 4th term also since there are 2 $\theta$'s, transposed and not. Help. Thanks.

3

There are 3 best solutions below

4
On BEST ANSWER

Rather than expanding the expression immediately, I find it simpler to define new variables (to reduce "clutter" in the function), differentiate, then substitute the original variables in the final steps.

Let
$$\eqalign{ B &= C^{-1} \cr y &= H\theta-x \cr }$$ Write the function in terms of these variables and take the differential $$\eqalign{ Q &= y^TBy \cr dQ &= dy^TBy + y^TB\,dy \cr &= y^T(B^T + B)\,dy \cr &= y^T(B^T + B)H\,d\theta \cr }$$ Since $dQ=(\frac{\partial Q}{\partial\theta}:d\theta),\,$ the gradient must be $$\eqalign{ \frac{\partial Q}{\partial\theta} &= y^T(B^T + B)H \cr &= (H\theta-x)^T(C^{-T} + C^{-1})\,H \cr }$$

0
On

Write

$$\begin{array}{rl} Q (x,\theta) &= (x - H \theta)^T C^{-1} (x - H \theta)\\ &= \begin{bmatrix} x\\ \theta\end{bmatrix}^T \begin{bmatrix} I\\ -H^T\end{bmatrix} C^{-1} \begin{bmatrix} I\\ -H^T\end{bmatrix}^T \begin{bmatrix} x\\ \theta\end{bmatrix}\\ &= \begin{bmatrix} x\\ \theta\end{bmatrix}^T \begin{bmatrix} C^{-1} & - C^{-1} H\\ -H^T C^{-1} & H^T C^{-1} H\end{bmatrix} \begin{bmatrix} x\\ \theta\end{bmatrix}\end{array}$$

Since the derivative of $y^T A y$ with respect to $y$ is $(A + A^T) y$, the derivative of $Q$ with respect to $(x,\theta)$, assuming that $C^{-1}$ is symmetric, is the following

$$2 \begin{bmatrix} C^{-1} & - C^{-1} H\\ -H^T C^{-1} & H^T C^{-1} H\end{bmatrix} \begin{bmatrix} x\\ \theta\end{bmatrix}$$

Thus, the derivative of $Q$ with respect to $\theta$ alone is

$$2 H^T C^{-1} (H \theta - x)$$

0
On

I managed to solve it via the product rule with the property I saw here at page 4 which basically states that $ D[ f(x)^Tg(x)] = g(x)^Tf^{'}(x) + f(x)^Tg^{'}(x)$. I have verified my answer with the solution posted by lynn.