Finding an analytical solution to $\underset{X}{\text{argmin}} \sum_{i=1}^{N} (y_i - \mathbf{a_{i}^TX^T b_i})^2$

659 Views Asked by At

I am trying to find a solution for the following equation:

$$\underset{X}{\text{argmin}} \sum_{i=1}^{N} (y_i - \mathbf{a_{i}^TX^T b_i})^2$$

Where $\mathbf{a_i}$ and $\mathbf{b_i}$ are vectors and $\mathbf{X}$ is a matrix.

I have worked out the derivative:

$$\begin{align} \nabla_X L(\mathbf{X}) &= \nabla_X \sum_{i=1}^{N} (y_i - \mathbf{a_{i}^TX^T b_i})^2 \\ &= \sum_{i=1}^{N} 2(y_i - \mathbf{a_{i}^TX^T b_i})\nabla_X(y_i - \mathbf{a_{i}^T X^T b_i}) \\ &= \sum_{i=1}^{N} -2(y_i - \mathbf{a_{i}^TX^T b_i})\mathbf{b_i a_i^T} \end{align}$$

When I set the derivative to $0$ I am not sure how to derive the solution for $\mathbf{X}$:

$$\begin{align} \sum_{i=1}^{N} -2(y_i &- \mathbf{a_{i}^T X^T b_i})\mathbf{b_i a_i^T} = \mathbf{0} \\ \sum_{i=1}^{N} y_i\mathbf{b_i a_i^T} &= \sum_{i=1}^{N}(\mathbf{a_{i}^T X^T b_i})\mathbf{b_i a_i^T} \\ \mathbf{X} &= \;? \end{align}$$

3

There are 3 best solutions below

3
On BEST ANSWER

$$\sum_{i=1}^n \left( \mathrm b_i^{\top} \mathrm X \,\mathrm a_i - y_i \right)^2 = \sum_{i=1}^n \left( \mathrm b_i^{\top} \mathrm X \,\mathrm a_i \mathrm a_i^{\top} \mathrm X^{\top} \mathrm b_i - 2 y_i \mathrm b_i^{\top} \mathrm X \,\mathrm a_i + y_i^2\right)$$

Differentiating this cost function with respect to $\mathrm X$ and finding where the derivative vanishes, we obtain the following linear matrix equation

$$\sum_{i=1}^n \mathrm b_i \left( \mathrm b_i^{\top} \mathrm X \,\mathrm a_i - y_i \right) \mathrm a_i^{\top} = \mathrm O$$

which can be rewritten as follows

$$\sum_{i=1}^n \left( \mathrm b_i \mathrm b_i^{\top} \right) \mathrm X \left( \mathrm a_i \mathrm a_i^{\top} \right) = \sum_{i=1}^n y_i \mathrm b_i \mathrm a_i^{\top}$$

Vectorizing, we obtain the following linear system

$$\left( \sum_{i=1}^n \left(\mathrm a_i \mathrm a_i^{\top}\right) \otimes \left(\mathrm b_i \mathrm b_i^{\top}\right) \right) \, \mbox{vec} (\mathrm X) = \sum_{i=1}^n (\mathrm a_i \otimes \mathrm b_i) \, y_i$$

0
On

You're almost there. If you put things into block matrices, then you'll have: $$y^T B^TA = (A^TXB)(B^TA)$$ Now multiply by the pseudoinverses, and you have $$ (AA^T)^{-1}A \left[y^TB^TA\right]A^TBB^T(BB^TA A^TBB^T)^{-1} = (AA^T)^{-1}A \left[A^T X B B^TA \right]A^TBB^T(BB^TA A^TBB^T)^{-1} = X $$

2
On

You can use vectorization to rewrite the loss function as $$\eqalign { L &= \sum_k \Big(b_k^TXa_k -y_k\Big)^2 \cr &= \sum_k \Big((a_k\otimes b_k)^T{\rm vec}(X) -y_k\Big)^2 \cr &= \sum_k \Big(c_k^Tx -y_k\Big)^2 \cr }$$ where $x={\rm vec}(X)$ and $c_k=a_k\otimes b_k$ are used for convenience.

Then you can drop the explicit summation notation and write the function in terms of the matrix $C$ (whose columns are the $\{c_k\}$ vectors) and the Frobenius norm. Or better yet, use the Frobenius (:) product. $$\eqalign { L &= \|C^Tx-y\|_F^2 \cr &= (C^Tx-y):(C^Tx-y) \cr }$$ In this form, finding the gradient is easy $$\eqalign { dL &= 2\,(C^Tx-y):C^T\,dx \cr &= 2\,C(C^Tx-y):dx \cr\cr \frac{\partial L}{\partial x} &= 2\,C(C^Tx-y) \cr }$$ Setting the gradient to zero and solving $$\eqalign { CC^Tx &= Cy \cr x &= (CC^T)^{-1}Cy \cr &= \operatorname{pinv}(C^T)\,y \cr }$$