Given $A ∈ R^{m×n}$ and a set of vectors $x_i ∈ R^m, y_i ∈ R^n, i = 1, 2, . . . , k$
(a) Find a set of coefficients $a_i’s$ such that $(||A - a_1x_1(y_1)^T - a_2x_2(y_2)^T - .... -a_kx_k(y_k)^T||_2)^2$ is minimized.
(b) What are the set of $x_i , y_i$ that minimize the minimum in (a) for $k < min(m, n)?$
For convenience, define the variables $$\eqalign{ &v = {\rm diag}(X^TAY),\quad &B={\rm Diag}(a)\in{\mathbb R}^{k\times k} \\ &X = \big[\,x_1\;x_2\;\ldots\;x_k\big]&\in{\mathbb R}^{m\times k} \\ &Y = \big[\,y_1\;y_2\;\ldots\;y_k\big]&\in{\mathbb R}^{n\times k} \\ &M = XBY^T-A &\in{\mathbb R}^{m\times n} \\ &P = X^TX\odot Y^TY &\in{\mathbb R}^{k\times k} \\ }$$ where diag() creates a vector from the diagonal of the matrix argument, Diag() creates a diagonal matrix from its vector argument, and $\odot$ represents the elementwise/Hadamard product.
Write the objective function in terms of these new variable.
Then calculate its differential and gradient. $$\eqalign{ \phi &= \tfrac{1}{2}\|\,(-M)\,\|_F^2 \\ &= \tfrac{1}{2}M:M \\ d\phi &= M:dM \\ &= M:X\,dB\,Y^T \\ &= X^TMY:dB \\ &= X^T(XBY^T-A)Y:{\rm Diag}(da) \\ &= {\rm diag}(X^TXBY^TY-X^TAY):da \\ &= (Pa-v):da \\ \frac{\partial \phi}{\partial a} &= Pa-v \\ }$$ Set the gradient to zero and solve for the optimal $a$ vector. $$\eqalign{ Pa &= v \\ a &= P^+v \;=\; (X^TX\odot Y^TY)^+{\rm diag}(X^TAY) \\ }$$ where $P^+$ denotes the pseudoinverse of $P$
NB: In some of the steps above, a colon is used as a convenient product notation for the trace, i.e. $$A:B = {\rm Tr}(A^TB)$$ Properties of the trace allow terms in such products to be rearranged in a number of ways, e.g. $$\eqalign{ A:B &= B:A \\ A:BC &= B^TA:C \;=\; AC^T:B \\ }$$