Problem on Matrix Calculus.

104 Views Asked by At

How do I find $$\arg\min_{\alpha \in\mathbb R^n} (K\alpha-y)^T(K\alpha-y)+\lambda \alpha^T K \alpha$$ using Matrix calculus ? Here $K$ is $n\times n$ matrix, $\alpha$ and $y$ are $n\times 1$ vectors, $\lambda$ is positive real number. I am completely new to Matrix Calculus, a solution might help me to relate with the Wikipedia's article on Matrix Calculus.

2

There are 2 best solutions below

1
On

Note that \begin{align} F(\alpha) =& (K\alpha-y)^T(K\alpha-y)+\lambda \alpha^T K \alpha \\ =& \big(\alpha^TK^T-y^T \big)(K\alpha-y) \\ =&\alpha^TK^TK\alpha -\alpha^TK^Ty-y^TK\alpha +y^T y \end{align} We have that $ DF (\alpha_0) = 0 $ implies $ \alpha_0 =\mathrm{argmin}_{\alpha\in\mathbb{R}^n}F(\alpha)$ since $ F $ is convex. \begin{align} F(\alpha+h) =& (\alpha+h)^TK^TK(\alpha+h)-(\alpha+h)^TK^Ty-y^TK(\alpha+h)+yTy \\ =&\alpha^T K^TK\alpha + h^T K^TK\alpha+\alpha^T K^TKh+h^T K^TKh \\ -&\alpha^TK^Ty-h^TK^Ty -y^TK\alpha-h^TK\alpha+y^TKy \\ =&F(\alpha)+ \underbrace{h^T K^TK\alpha+\alpha^T K^TKh-h^TK^Ty -h^TK\alpha}_{DF(\alpha)\cdot h} +{h^T K^TKh} \end{align} Recall that $\alpha^TK^TKh=h^T(\alpha^TK^TK)^T=h^TK^T(\alpha^TK^T)^T=h^TK^TK\alpha$. We have \begin{align} DF(\alpha)\cdot h =& h^T K^TK\alpha+\alpha^T K^TKh-h^TK^Ty -h^TK\alpha \\ =& h^T K^TK\alpha+h^TK^TK\alpha-h^TK^Ty -h^TK\alpha \\ =&h^T(2K^TK\alpha-Ky-K\alpha) \end{align} Then $F(\alpha)=0$ if, only if, $\alpha$ is solution of linear system $$ (2K^TK-K)\alpha=ky $$ If there is $(2K^TK-K)^{-1}$ then $$ \alpha=(2K^TK-K)^{-1}Ky. $$

0
On

Let's define a new variable $x=(Ka-y)$ and use a colon to denote the trace/Frobenius product $$A:B = {\rm tr\,}(A^TB)$$ Write the function in terms of this new variable. Then find its differential and gradient. $$\eqalign{ \phi &= x:x + \lambda K:aa^T \cr d\phi &= 2x:dx + \lambda K:(da\,a^T + a\,da^T) \cr &= 2x:K\,da + \lambda(K+K^T):da\,a^T \cr &= \big(2K^Tx + \lambda(K+K^T)\,a\big):da \cr &= \big(2K^T(Ka-y) + \lambda(K+K^T)\,a\big):da \cr \frac{\partial\phi}{\partial a} &= 2K^TKa-2K^Ty + \lambda(K+K^T)\,a \cr }$$ Set the gradient to zero and solve $$\eqalign{ &K^TKa + \tfrac{\lambda}{2}(K+K^T)\,a = K^Ty \cr &a = \big(K^TK + \tfrac{\lambda}{2}(K+K^T)\big)^{-1}K^Ty \cr }$$ If $K$ is symmetric, this can be reduced to $$a = (K^2 + \lambda K)^{-1}Ky$$ If $K^{-1}$ exists, it can be further reduced to $$a = (K + \lambda I)^{-1}y$$