In class, in the context of matrix completion (recommender Systems, collaborative filtering), we have seen that the Eckart-Young theorem says the best rank-$k$ approximation to a matrix $A$ is given by taking the dominant first $k < \mbox{rank}(A)$ singular values.
Thus, if we have a rating matrix, like in the Netflix data, it was mentioned that the rating matrix has many unobserved values that are not defined. The task is to infer these entries. One solution approach we have seen is via Matrix Factorization using Alternating Least Squares.
Why can't we use SVD to solve this problem?
Since with the Eckart-Young theorem it is guaranteed that we get the best rank-$k$ approximation of the rating matrix, it seems that unobserved values incur a problem, we cannot just set the unobserved values to $0$ or the mean of the data, but why is that?
Does it have to do with the rank-k constraint in the minimization problem
$$ A_k = \arg \min_{\mbox{rank}(B)=k} \left\Vert A - B \right\Vert_F^2 $$
because with the rank constrained it is a non-convex problem and Eckart-Young theorem does not guarantee the best rank-$k$ approximation anymore if there are unobserved values in $A$?
If you have started getting good at vectorization and Kronecker products, one approach you can do is
$$\min_{U,V,\Sigma} \{C_1\|A-U\Sigma V^T\|+ \|UU^T-I\|+ \|VV^T-I\| + \|C_2\Sigma\| \}$$
Weight matrix $C_1$ should reflect which values of $A$ are unknown.
Weight matrix $C_2$ should reflect the rank you seek. $k$ small $\epsilon$s and the rest large values along diagonal and outside. Should work neatly.
As it probably needs some maturity and skills, it might be suitable work for a PhD student main project.