Given vectors $\pmb{y}_0, \pmb{y}_1, \dots \pmb{y}_t \in \mathbb{R}^n$, let $F : \mathbb{R}^{n \times n} \to \mathbb{R}_0^+$ be defined by
$$F(W) := \left\| \pmb{y}_1 - W \pmb{y}_0 \right\|^2 + \left\| \pmb{y}_2 - W^2 \pmb{y}_0 \right\|^2 + \cdots + \left\| \pmb{y}_t - W^t \pmb{y}_0 \right\|^2 = \sum_{i=1}^{t} \left\| \pmb{y}_i - W^i \pmb{y}_0 \right\|^2 $$
I need to (numerically) minimize the function $F$. I tried calculating the gradient $\nabla_W F$, but the gradient of (almost) arbitrary matrix to a power seems to be hard to calculate. I would much appreciate if you showed me, how to calculate $\nabla_W F$, or if you came up with another solution to this problem.
$ \def\a{\phi} \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\sym#1{\op{Sym}\LR{#1}} \def\skew#1{\op{Skew}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\Sj{\sum_{j=\o}^k\:} $Calculate the differential of the $k^{th}$ power of the matrix $$\eqalign{ dW^k &= \Sj W^{j-\o}\:dW\:W^{k-j} \\ }$$ Then calculate the gradient of the $k^{th}$ term of the sum $$\eqalign{ z_k &\equiv \LR{W^ky_0-y_k} \qquad \{{\rm for\;convenience}\} \\ \a_k &= \frob{z_k}^2 \\ d\a_k &= 2z_k:dz_k \\ &= 2z_k:\LR{dW^k\;y_o} \\ &= 2z_ky_0^T:\c{dW^k} \\ &= 2\LR{y_0z_k^T}^T:\LR{\Sj W^{j-\o}\:dW\:W^{k-j}} \\ &= 2\LR{\Sj W^{j-\o}\:y_0\c{z_k^T}W^{k-j}}^T:dW \\ &= 2\LR{\Sj W^{j-\o}\:y_0\CLR{W^ky_0-y_k}^\c{T}W^{k-j}}^T:dW \\ \grad{\a_k}{W} &= 2\LR{\Sj W^{j-\o}\:y_0\LR{W^ky_0-y_k}^TW^{k-j}}^T \\ }$$ where a colon denotes the matrix inner product, which has these properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{AB}:C &= A:\LR{CB^T} \;=\; B:\LR{A^TC} \\ \\ }$$
Summing over all terms recovers your function $$\eqalign{ F &= \sum_{k=\o}^{\large t} \a_k \qiq \grad FW &= \sum_{k=\o}^{\large t} \grad{\a_k}{W} \\ }$$