If I consider the following optimization problem
\begin{align} &\min_{\phi \in \mathbb{R}^d} \|X- \phi \phi^T X\|^2_F\\ &\text{s.t. } \, \, \, \phi^T\phi=1 \end{align}
where $X$ is a $d \times n$ matrix and $\|\cdot\|_F$ denotes the frobenius norm. I want to show that this is just an eigenvalue problem.
With some few steps I can write $\|X- \phi \phi^T X\|^2_F = \operatorname{tr}(X^TX - XX^T\phi\phi^T)$ that I now consider as the objective function. So writing the lagrangian (and using linearity of trace) I get
\begin{equation} \mathcal{L} = \operatorname{tr}(X^TX) - \operatorname{tr}(XX^T\phi\phi^T) -\lambda(\phi^T\phi-1). \end{equation}
Then,
\begin{equation} \mathbb{R}^d \ni\frac{\partial\mathcal{L}}{\partial\phi} = -\operatorname{Tr}(2XX^T\phi) - 2\lambda\phi. \end{equation}
where I've used again linearity of trace and then commutation with derivative.
\begin{equation} \frac{d(\operatorname{tr}(f(X))}{dX} = \operatorname{Tr}\left(\frac{df(X)}{dX}\right) \end{equation}
The little thing that makes me wonder is that now the object I have inside the trace has dimensions $d \times 1$ (it's a vector), so the Trace isn't defined? Or is it just equal to its argument? And if yes why? Thanks!
EDIT:
Thank you, a way that I have found to justify in a more rigorous way that works for me is the following:
\begin{equation} \frac{\partial tr(XX^T\phi\phi^T)}{\partial \phi} = \frac{\partial}{\partial\phi}\sum_{i,j}x_{ij}^2\phi_j^2 = \sum_{i,j}2x_{i,j}^2\phi_j = 2XX^T\phi \end{equation}
The key thing here is that the trace of a product of matrices can be seen as the the sum of entry-wise products of their elements i.e. \begin{equation} tr(A^TB) = \sum_{i,j}(A\circ B)_{i,j} \end{equation}
where $\circ$ denotes the Hadamard product.
Since $\phi$ is just a column-vector, you can circumvent the headache of assigning a data-type to a matrix-by-matrix derivative if you rewrite the original expression as $$ \begin{align} \|X - \phi\phi^T X\|_F^2 &= \operatorname{tr}[(X - \phi\phi^TX)^T(X - \phi\phi^TX)] \\ & = \operatorname{tr}(X^TX) - 2\operatorname{tr}(X^T\phi\phi^TX) + \operatorname{tr}(X^T\phi\phi^T\phi\phi^TX) \\ & = \operatorname{tr}(X^TX) - \operatorname{tr}(XX^T\phi\phi^T) \\ & = \operatorname{tr}(X^TX) - \operatorname{tr}(\phi^TXX^T\phi) = \operatorname{tr}(X^TX) - \phi^TXX^T\phi. \end{align} $$ In other words, it is equivalent to consider the optimization problem $$ \max_{\phi \in \Bbb R^d} \phi^T XX^T\phi \quad \text{s.t.} \quad \phi^T\phi = 1. $$ The fact that this maximum is the maximal eigenvalue of $XX^T$ (attained with $\phi$ equal to the corresponding eigenvector) is known as the "Rayleigh-Ritz theorem", but if you wanted you could derive this result using Lagrange multipliers.