Consider the objective for Matrix Factorization: $$f(W,H) = \frac 12 \|X - WH\|_F^2 \to \min,$$ where $X \in \mathbb R^{n \times m}$ is given and $W \in \mathbb R^{n \times k}$, $H \in \mathbb R^{k \times m}$ are required to find.
Two questions:
- Is there a simple representation of $\nabla^2 f$? It should be a matrix of size $(nk + km) \times (nk + km)$, which stores $\frac {\partial^2 f} {\partial W_{ij} \partial W_{kl}}$, $\frac {\partial^2 f} {\partial W_{ij} \partial H_{kl}}$, $\frac {\partial^2 f} {\partial H_{ij} \partial H_{kl}}$ for all $i,j,k,l$.
For gradients, there are simple formulae: \begin{align*} \nabla f_W &= -X H^\top + W H H^\top \\ \nabla f_H &= -W^\top X + W^\top W H, \end{align*} which are not hard to obtain by expanding the Frobenius norm using the inner product. For Hessian, I don't know one: I tried to do this, it's doable, but intermediate results are a complete mess.
- (What I really need) Is there a simple way to compute the smallest eigenvalue of $\nabla^2 f$? In the first place, I don't want to build $\nabla^2 f$ if possible, since it's a bit too large, and I'm not sure I'll be able to find the smallest eigenvalue quickly. At the very least, is there a way to work with a smaller matrix instead? It's also guaranteed that all elements of $X,W,H$ are nonnegative.
From your formula, we can see that $$ \frac{\partial f}{\partial W_{ij}} = -X^i (H^j)^T + W^i H (H^j)^T,\\ \frac{\partial f}{\partial H_{ij}} = -(W_i)^TX_j + W_i^T W (H_j), $$ where $M^i$ denotes the $i$th row of $M$ and $M_i$ the $i$th column. Thus, we have $$ \frac{\partial^2 f}{\partial W_{kl}\partial W_{ij}} = \delta_{ik} H^l (H^j)^T. $$ The mixed partial is a bit trickier. We find that $$ \frac{\partial^2 f}{\partial H_{kl}\partial W_{ij}} = -\delta_{jl}X_{ij} + W_{ik}H_{lj} + \delta_{jk}W^iH_l, $$ Finally, we have $$ \frac{\partial^2 f}{\partial H_{kl}\partial H_{ij}} = \delta_{jl} W_i^TW_k. $$
With that, we can build $M = \nabla_W^2 f$: we have $$ M = \sum_{i,k = 1}^n\sum_{j,l = 1}^k (e_j \otimes e_i)(e_l \otimes e_k)^T\delta_{ik} H^l (H^j)^T \\ = \sum_{i = 1}^n\sum_{j,l = 1}^k (e_j \otimes e_i)(e_l^T \otimes e_i^T) H^l (H^j)^T \\ = \sum_{i = 1}^n\sum_{j,l = 1}^k (e_j \otimes e_i)(e_l^T \otimes e_i^T) e_l^TH H^T e_j \\ = \sum_{i = 1}^n\sum_{j,l = 1}^k (e_je_l^TH H^T e_je_l^T) \otimes (e_ie_i^T) \\ = \sum_{j,l = 1}^k ([H H^T]_{lj}e_je_l^T) \otimes I_n \\ = (HH^T) \otimes I_n. $$ I suspect that similar computations can be made for the remaining Hessian blocks.