Matrix Differentiation of $-a^T X^T y$ on $a.$

209 Views Asked by At

In short; what is the correct differentiation of: $$S(a)=-a^TX^Ty$$ when differentiating: $$0=\frac{∂S}{∂a}= \;?$$ Long story is; I know that: $$J(a)=\underbrace{\:\:\:a^TX^TXa\:\:\:}_u\:\underbrace{\:\:-y^TXa\:\:}_v \:\underbrace{\:\:-a^TX^Ty\:\:}_{w}+y^Ty$$ and its gradient on $a$ set to zero is: $$0=\frac{∂J}{∂a}=\underbrace{\:\:\:a^TX^TX+(X^TXa)^T\:\:}_u \:\underbrace{\:\:\:-\:(y^TX)}_v\:\underbrace{\:\:\:-(X^Ty)^T}_w $$

What is the matrix differentiation rule for term $w$?

I know the rule for term $u$. Term $v$ is like in normal analysis. Equations are taken from Constrained Least Squares - Gavin 2015. I could not find that rule in the Matrix Cookbook or other university sources for Matrix Calculus or Matrix Differentiation introduction courses.

2

There are 2 best solutions below

4
On BEST ANSWER

$$S(a)=-a^TX^Ty$$

First note the following: $$dF(a_1,...,a_n)=\dfrac{\partial F}{\partial a_1}da_1+...+\dfrac{\partial F}{\partial a_n}da_n.$$

This is just the definition of the total derivative of a multivalued function. We can rewrite this as a dot product: $$dF(a_1,...,a_n)=[da_1,...,da_n][\dfrac{\partial F}{\partial a_1},...,\dfrac{\partial F}{\partial a_n}]^T.$$

We note that the term in the bracket is the gradient of $F$. Hence, $dF=da^T\cdot\nabla F^T$.

Now differentiate $S(a)$ w.r.t. $a^T$, assume $X^Ty$ to be constant and think of differentiation similar to implicit differentiation: $dS=-da^T\cdot X^Ty$.

Hence by comparision we conculde that the gradient is: $\nabla F^T=-X^Ty$ or $\nabla F=-y^TX$.

0
On

For convenience, define a new variable $$s=Xa-y$$

Then use the Frobenius (:) Inner Product and this new variable to write the function more consisely, so that finding the differential and gradient becomes almost trivial
$$\eqalign{ J &= s:s \cr\cr dJ &= 2s:ds \cr &= 2s:Xda \cr &= 2X^Ts:da \cr\cr \frac{\partial J}{\partial a} &= 2X^Ts \cr &= 2X^T(Xa-y) \cr\cr } $$ Depending on which layout convention they follow, some people will define the gradient as the transpose of this result.