What should I know about half vectorization and Kronecker product to do this matrix differentation?

Question

What should I know about half vectorization and Kronecker product to do this matrix differentation?

1k Views Asked by Bumbble Comm At 09 Apr 2026 - 9:09

I have a scalar function as follows: \begin{equation*} \ell(\beta, \Sigma, \mu, \Lambda) = \sum_{i=1}^{m} \left[\boldsymbol{y}_{i}^{T} \left(X_{i}\beta + Z_{1} \mu_{i} \right) - \boldsymbol{1}_{i}^{T} B\left(X_{i}\beta + Z_{i}\mu_{i}, \operatorname{dg}\left(Z_{i}\Lambda_{i}Z_{i}^{T} \right) \right) + \boldsymbol{1}_{i}^{T} c(\boldsymbol{y}_{i})\\ +\frac{1}{2}\left\{\log \left|\Sigma^{-1}\Lambda_{i} \right| - \mu_{i}^{T}\Sigma^{-1}\mu_{i} - \operatorname{tr}\left(\Sigma^{-1}\Lambda_{i} \right) \right\} + \frac{K}{2} \right] \end{equation*}

where for $1 \le i \le m$

$ \mu_{i}: K \times 1 \text{vectors} $

$ \Lambda_{i}: K \times K \text{matrices} $

$ \boldsymbol{y}_{i}: \begin{bmatrix}y_{i1} & \ldots & y_{in_{i}} \end{bmatrix}^{T}, n_{i} \times 1 \text{vectors} $

$ \boldsymbol{1}_{i}: \begin{bmatrix}1, \ldots , 1 \end{bmatrix}^{T} $

$ \Sigma: K\times K \text{positive definite matrix} $

$ X_{i}, Z_{i}: \text{design matrices in regression models} $

$ B(\mu, \sigma^{2}) = \int_{-\infty}^{\infty} b(\sigma x + \mu)\phi(x) \, dx, \text{ where } \phi(x) = \frac{1}{\sqrt{2\pi}}\exp \left(-x^{2}/2 \right), b(\cdot) \text{ is any differentiable function.} $

$ \operatorname{dg}\left(A\right): \text{for a square matrix } A, \text{ returns a column vector containing the diagonal entries of }A. $

So the paper suggests that if $D_{p}$ denote the duplication matrix of order $p$ defined through the relationship $\operatorname{vec}\left( A \right) = D_{p}\operatorname{vech}\left(A \right)$ for a symmetric $p \times p$ matrix $A$. Then $$ \frac{\partial \ell}{\partial \operatorname{vech}\left(\Sigma \right)} = \frac{1}{2} \sum_{i=1}^{m} \operatorname{vec} \left\{\Sigma^{-1} \left(\mu_{i}\mu_{i}^{T} + \Lambda_{i} \right)\Sigma^{-1} - \Sigma^{-1} \right\}^{T}D_{K} $$ and $$ \frac{\partial^{2} \ell}{\partial \operatorname{vech}\left( \Sigma \right) \partial \operatorname{vech}\left( \Lambda_{i}\right)} = \frac{1}{2} D_{K}^{T}\left(\Sigma^{-1}\otimes \Sigma^{-1} \right)D_{K}. $$

What concepts should I know so as to get my head around these calculations? (I actually need these for Newton-Raphson algorithm, optimizing over $\left(\beta, \operatorname{vech}\left( \Sigma\right), \mu_{1}, \operatorname{vech}\left(\Lambda_{1}\right), \ldots , \mu_{m}, \operatorname{vech}\left(\Lambda_{m}\right) \right)$.)

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

You have a very complicated function, but I'll show you how to find the first result.

Start by taking the differential of the function, assuming all of the variables (except for $\Sigma),\,$ are constant and so the differentials of those terms are zero $$\eqalign{ d\ell &= \frac{1}{2}\sum_i \Big[d(\log\det(\Sigma^{-1}\Lambda_i)) - d(\mu_i\mu_i^T:\Sigma^{-1}) - d(\Lambda_i:\Sigma^{-1})\Big] \cr &= \frac{1}{2}\sum_i \Big[d({\rm tr}\log(\Sigma^{-1}\Lambda_i)) - d(\mu_i\mu_i^T:\Sigma^{-1}) - d(\Lambda_i:\Sigma^{-1})\Big] \cr &= \frac{1}{2}\sum_i \Big[(\Sigma^{-1}\Lambda_i)^{-T}:(d\Sigma^{-1}\Lambda_i) - \mu_i\mu_i^T:d\Sigma^{-1} - \Lambda_i:d\Sigma^{-1}\Big] \cr &= \frac{1}{2}\sum_i \Big[(\Sigma^T\Lambda_i^{-T})\Lambda_i^T - \mu_i\mu_i^T - \Lambda_i\Big]:d\Sigma^{-1} \cr &= \frac{1}{2}\sum_i \Big[-\Sigma^T + \mu_i\mu_i^T + \Lambda_i\Big]:\Sigma^{-1}\,d\Sigma\,\Sigma^{-1} \cr &= \frac{1}{2}\sum_i \Big[\Sigma^{-T}(\mu_i\mu_i^T + \Lambda_i)\Sigma^{-T}-\Sigma^{-T} \Big]:d\Sigma \cr\cr }$$ Now use some of the properties of vec/vech $$\eqalign{ A:B &= {\rm vec}(A)^T{\rm vec}(B) \cr {\rm vec}(X) &= D_p {\rm vech}(X) \cr }$$ to recast the last line as $$\eqalign{ d\ell &= \frac{1}{2}\sum_i {\rm vec}\Big[\Sigma^{-T}(\mu_i\mu_i^T + \Lambda_i)\Sigma^{-T}-\Sigma^{-T} \Big]^T\Big(D_K{\rm vech}(d\Sigma)\Big) \cr }$$ You can simplify this further using $(\Sigma^T=\Sigma)$, to obtain the first derivative (aka gradient) in your question.

To get the mixed second derivative, take the differential of the gradient. This time assume that all variables except for the $\Lambda_i$ are constant.

What should I know about half vectorization and Kronecker product to do this matrix differentation?

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DERIVATIVES

Related Questions in NUMERICAL-METHODS

Related Questions in NUMERICAL-LINEAR-ALGEBRA

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions