I am working with Matrix variate data using the EM algorithm.
Deriving the E-step and M-step is straightforward under this setting. Now, I am interested in deriving a restricted mean matrix in the M-step.
Assuming the first column of the mean matrix is fixed at the same value while the other elements vary.
Finally, how can one derive the $\frac{\partial \mathbf{Q}(\boldsymbol{\theta})}{\partial \mathbf{M}^{\star}}$ and $\frac{\partial \mathbf{Q}(\boldsymbol{\theta})}{\partial a}$
$ \def\c#1{\color{red}{#1}} \def\o{{\tt1}} \def\bbR#1{{\mathbb R}^{#1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\size#1{\op{size}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{c}#1\end{array}\right]} \def\Sj{\sum_{j=\o}^m} \def\Sk{\sum_{k=\o}^n} \def\Skk{\sum_{\c{k=2}}^n} $Consider the following variables and their dimensions $$\eqalign{ p,q &= \size{a} \;=\; \size{b} \;=\; \size{M_{jk}} \\ m,\o &= \size{u_j} \qquad \{ {\rm standard\;basis\;vector} \} \\ n,\o &= \size{v_k} \qquad \{ {\rm standard\;basis\;vector} \} \\ p,p &= \size{I_p} \qquad \{ {\rm identity\;matrix} \} \\ mp,nq &= \size{M} \\ }$$ Matrix analogs of the standard basis vectors can be constructed as $$\eqalign{ U_j &= \LR{u_j\otimes I_p} \in\bbR{mp\times p} \;&\implies\; U_j^TU_k = \delta_{jk}I_p \\ V_k &= \LR{v_k\otimes I_q} \in\bbR{nq\times q} \;&\implies\; V_j^TV_k = \delta_{jk}I_q \\ }$$ and can be used to partition the $M$ matrix $$\eqalign{ &M_{jk} = U_j^TMV_k \\ &M = \Sj\Sk\: U_jM_{jk}V_k^T = \m{ M_{\o\o}&M_{\o 2}&\ldots&M_{\o n} \\ M_{2\o}&M_{22}&\ldots&M_{2n} \\ \vdots&\vdots&\ddots&\vdots \\ M_{m\o}&M_{m2}&\ldots&M_{mn} \\ } \\ }$$ The Frobenius product $(:)$ is extremely useful in Matrix Calculus $$\eqalign{ A:B &= \Sj\Sk A_{jk}B_{jk} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \} \\ A:B &= B:A \qquad \{ {\rm commutes} \} \\ A:B &= A^T:B^T \;\;\; \{ {\rm transposes} \} \\ C:\LR{AB} &= \LR{CB^T}:A \;=\, \LR{A^TC}:B \\ }$$ Finally, the $Q$-function can be written as $$\eqalign{ X &= \LR{M-Y},\quad P = P^T = \Psi^{-1},\quad S = S^T = \Sigma^{-1} \\ Q &= -\tfrac12S:\LR{XPX^T} \\ }$$ Its differential with respect to $M$ can be calculated as $$\eqalign{ dQ &= -\tfrac12S:\LR{dX\:PX^T+XP\:dX^T} \\ &= -S:\LR{dX\:PX^T} \\ &= -\LR{SXP}:dX \\ &= -\LR{SXP}:dM \\ }$$ and with respect to a single partition $M_{jk}$ as $$\eqalign{ dQ &= -\LR{SXP}:\LR{U_j\,dM_{jk}\,V_k^T} \\ &= -\LR{U_j^TSXPV_k}:dM_{jk} \\ }$$ From this result, the requested gradient is seen to be $$\eqalign{ \grad{Q}{M_{jk}} &= -{U_j^TSXPV_k} \qquad \qquad \qquad \\ }$$ The other gradient merely omits the first block column from consideration $$\eqalign{ \grad{Q}{M^*} &= -\Sj\Skk {U_j^TSXPV_k} \qquad \qquad \\ }$$