What is the trick: convert summation notation to matrix notation

462 Views Asked by At

$C \in \mathbb R^{m \times n}, X \in \mathbb R^{m \times n}, W \in \mathbb R^{m \times d}, H \in \mathbb R^{n \times d}$, $W_k.$ is the $i$th row of $W$, $H_j.$ is the $j$th row of $H$, $\lambda$ is a scalar.

I obtained the following derivative of $f$ wrt each entry of $W$ ($W_{kl}$):

$$\frac{\partial f}{\partial W_{kl}}=\sum_j C_{kj}(W_k. H_j.^T - X_{kj})H_{jl} + \lambda(\sum_j C_{kj}) W_{kl} \quad (1)$$

Now, I would like to get the derivative of $f$ wrt a row vector of $W$ ($W_k.$), that is:

$$\frac{\partial f}{\partial W_k.}=(\frac{\partial f}{\partial W_{k1}}, \frac{\partial f}{\partial W_{k2}}, \frac{\partial f}{\partial W_{k3}},...,\frac{\partial f}{\partial W_{kd}}) \quad (2)$$

My question is, according to $(1)$, how can I get the solution of $(2)$, which should be expressed in matrix form (matrix notation)?

Actually, I know that anwser from a reference:

$$\frac{\partial f}{\partial W_k.}=W_k.(H^TDiag(C_k.)H + \lambda(\sum_j C_{kj})I) - X_k. Diag(C_k.) H \quad (3)$$, where $C_k.$ is the $k$th row of $C$, $Diag(C_k.)$ is the diagonal matrix of $C_k.$ where each element of of $C_k.$ on the diagonal, $I$ is the identity matrix $(I \in \mathbb R^{d \times d})$.

Question is how to get this answer? what is the trick convert the summation notation to matrix notation? How to get $(3)$ based on $(1)$ and $(2)$?

Thanks.