How to derive and compute second derivative of matrix quadratic form?

606 Views Asked by At

I want to take the first and second derivative of the following matrix quadratic form,

$$Q = {\phi}(\beta)'W{\phi}(\beta),$$

where, $\phi(\beta)$ is a $q \times 1$ vector, $\beta$ is a $k \times 1$ vector, and W is a $q \times q$ symmetric matrix, then Q is a scalar. So, I would expect the first derivative Q w.r.t to $\beta$ to be a $k \times 1$ vector $\frac{\partial{Q}}{\partial{\beta}}$, and the second derivative to be a $k \times k$ matrix $\frac{\partial^2{Q}}{\partial{\beta}\partial\beta'}$.

Then, taking the first derivative of Q w.r.t $\beta$ yields,

$$\frac{\partial{Q}}{\partial{\beta}} = 2\frac{\partial{\phi}'}{\partial{\beta}}W\phi(\beta),$$

which is $k \times 1$ as expected. Then, the second derivative yields me,

$$\frac{\partial^2{Q}}{\partial{\beta}\partial\beta'} = 2\left(\frac{\partial^2{\phi(\beta)'}}{\partial{\beta}\partial\beta'}W\phi(\beta) + \frac{\partial{\phi(\beta)'}}{\partial\beta}W\frac{\partial{\phi(\beta)}}{\partial\beta'}\right),$$

which should be $k \times k$. The second term inside the parenthesis is $k \times k$, but the first one isn't? It doesn't quite make sense to me, but it can be of the right size because $W$ and $\phi$ give together $q \times 1$, while $\frac{\partial^2{\phi(\beta)'}}{\partial{\beta}\partial\beta'}$ I don't know what shape it might have ($\frac{\partial{\phi(\beta)'}}{\partial{\beta}}$, is $k \times q$, but differentiating such a matrix w.r.t. a $1 \times k$ vector..., how does that work? I don't see how would it be compatible with the terms that follow it).

In any case, I know it is wrong, because I have a formula outlining what the correct form should be, but I don't understand how to arrive to it or how does it work. According to this formula, the second derivative should be,

$$\frac{\partial^2{Q}}{\partial{\beta}\partial\beta'} = 2\left(({\phi}(\beta)'W\otimes I_k)\frac{\partial vec(\partial\phi(\beta)'/\partial\beta)}{\partial{\beta'}}+\frac{\partial{\phi(\beta)'}}{\partial\beta}W\frac{\partial{\phi(\beta)}}{\partial\beta'}\right).$$ Here, $vec(\partial\phi(\beta)'/\partial\beta)$ is the vectorisation of a $k \times q$ matrix, which would be, to my understanding, a $(k \times q)\times 1$ vector(?). Differentiating this vector w.r.t. a $1 \times k$ vector $\beta'$ would yield a $(k \times q)\times k$ matrix. Furthermore, following my understanding of the Kronecker product, $W \otimes I_k$ would yield a $(k \times q)\times (k \times q)$ matrix. And all this would be alright except the term $\phi(\beta)'$ is still a $1 \times q$ vector and thus incompatible with the other terms following -- unless we do $\phi(\beta)W$ and then follow with $I_k$ yielding a $1 \times q$ vector $\otimes$ a $k \times k$ matrix, but I can't anywhere where it says the Kronecker product works between a vector and a matrix (and if so, what shape would the result then have? How would I compute it?)

I don't know where I am going wrong, and I would appreciate it very much if someone could guide me through this conundrum.

1

There are 1 best solutions below

2
On

For typing convenience, define the following variables $$\eqalign{ &A = ZWZ^T, \quad &b = \beta, \quad &e = \exp(Xb), \quad &E = {\rm Diag}(e) \\ & &p = (e-y), \quad &r = Ap, \quad &R = {\rm Diag}(r) \\ }$$ And the differentials $$\eqalign{ dp &= de = e\odot(X\,db) = EX\,db \\ dr &= A\,dp = AEX\,db \\ }$$ where the elementwise/Hadamard product $(\odot)$ of vectors was replaced by ordinary matrix multiplication by a diagonal matrix. This is a standard trick, e.g. $$Er = e\odot r = r\odot e = Re$$

Write the quadratic form in terms of these new variables. Then find its differential and gradient. $$\eqalign{ Q &= p^TZWZ^Tp &= p^Tr \\ dQ &= p^Tdr + r^Tdp &= 2r^TEX\,db \\&= (2X^TEr)^Tdb \\ g=\;\frac{\partial Q}{\partial b} &= 2X^TEr &= 2X^TRe \\ }$$ Now find the differential of the gradient, and then the hessian. $$\eqalign{ dg &= 2X^TE\,dr + 2X^TR\,de \\ &= 2X^TEAEX\,db + 2X^TREX\,db \\ &= 2X^T(EA + R)EX\,db \\ H =\; \frac{\partial g}{\partial b} &= 2X^T(EA + R)EX \\ }$$

The hessian is symmetric since the matrices $(A,E,R,W)$ are all symmetric, and the matrices $(E,R)$ are diagonal (and therefore commute with each other).