I have problem in following form: Let $X\in R^{N\times M}$ denote feature matrix for $M$ features and $Y\in R^{N\times T}$ be response matrix for $N$ data points over $T$ variables. I have function $$f(X) + h(W)$$ where $f(W)=||Y-XW||_F^2$ I solve this with proximal gradient decent where I have proximal operator of $h(W)$
Step I : Compute gradient: $$W^{t+1}=W-s\nabla f(W)$$ Step II: $$prox_h(W^{t+1})$$ Where $s$ is step size. I want to use backtracking line search for selecting step-size. I know how to do it, if $Y$ and $W$ were vectors. But seems that it gets more complicated when I extend it to matrix form.\
Here is how people do it for the case where $Y$ and $W$ are vectors:
\begin{equation} G_s(W)= \frac{W-prox_s(W - s\nabla f(W))}{s} \end{equation}
Step I
Fix a parameter $0<\beta<1$
Step II
At each iteration we start with $s=1$, and while
$$f(W-s * G_s(W))>f(W)-s\nabla f(W)^T * (G_s(W)) + \frac{1}{2}||(G_s(W)||^2_2$$ shrink $s=\beta s$. Else perform prox gradient update
\end{equation}
Whem $W$ amd $Y$ are vectors, it's easy. since $$s\nabla f(W)^T*G_s(W)$$ is constant (inner product of two vectors), but when $W$ is matrix, then it become matrix and we can not evaluate above equation. Does anybody have better solution backtraking line search for this case ?