Solving linear algebra equality (gradient-based optimization)

58 Views Asked by At

Context:

I am reading the paper "Gradient-based meta-learning with learned layerwise metric and subspace" (arxiv), and am having some trouble with how one of the equalities on page 5.

The authors state the following (where bold font denotes a matrix and non-bold font denotes a vector):

\begin{equation} y = \boldsymbol{TW}x = \boldsymbol{A}x \end{equation}

They then go on to say that their new learning rule update is given as:

\begin{equation} y_{new} = \boldsymbol{T}(\boldsymbol{T}^{-1}\boldsymbol{A} - \alpha \nabla_{\boldsymbol{T}^{-1}\boldsymbol{A}}\mathcal{L_T})x \end{equation}

which they say is equal to the following:

\begin{equation} y_{new} = y - \alpha (\boldsymbol{TT}^{\intercal})\nabla_{\boldsymbol{A}}\mathcal{L_T}x \end{equation}

Question:

However I don't quite understand how this is derived, in particular I am unsure as to where the term $(\boldsymbol{TT}^{\intercal})\nabla_{\boldsymbol{A}}$ comes from. Can someone please give me some guidance on how to resolve this?

My attempt:

Here is my attempt, but I am unsure how to derive the RHS.

\begin{align} y_{new} &= \boldsymbol{T}(\boldsymbol{T}^{-1}\boldsymbol{A} - \alpha \nabla_{\boldsymbol{T}^{-1}\boldsymbol{A}}\mathcal{L_T})x \\ &= \boldsymbol{T}\boldsymbol{T}^{-1}\boldsymbol{A}x - \alpha \boldsymbol{T}\nabla_{\boldsymbol{T}^{-1}\boldsymbol{A}}\mathcal{L_T}x \\ &= \boldsymbol{I}y - \alpha \boldsymbol{T}\nabla_{\boldsymbol{T}^{-1}\boldsymbol{A}}\mathcal{L_T}x \\ &= y - \alpha \boldsymbol{T}\nabla_{\boldsymbol{T}^{-1}\boldsymbol{A}}\mathcal{L_T}x \\ &= \dots \\ \end{align}

Thank you very much for your time :)