Can anyone help to solve this problem?
$$ y = \left \| XWZ - X \right \|_{F}^{2} \qquad X \in \mathbb{R}^{a \times b}, \quad W \in \mathbb{R}^{b \times c}, \quad Z \in \mathbb{R}^{c \times b} $$
Gradient of $y$ w.r.t. $W$:
$$ \bigtriangledown_W y = ? $$
Gradient of $y$ w.r.t. $Z$:
$$ \bigtriangledown_Z y = ? $$
Let's look at $y$ as the following function $$ \varphi: \mathbb{R}^{a \times b} \times \mathbb{R}^{b \times c} \times \mathbb{R}^{c \times b} \longrightarrow \mathbb{R} $$ whose transformation law is
$$ \varphi(X,W,Z) = \left \| XWZ - X \right \|_{F}^{2}= \text{trace}\Big((XWZ - X ) (XWZ - X )^{T} \Big) $$ After some algebraic manipulation you will get $$ \varphi(X, W, Z) = \text{trace}\Big(ZW^T(X^TX)WZ\Big) - 2\text{trace}\Big(ZW^T(X^TX)\Big) + \text{trace}\Big(X^TX\Big) $$
To compute the gradient of $\varphi$ with respect to the matrix $W$, we first differentiate $\varphi(X, W, Z)$ with respect to the entries of $W$. Recall that $\varphi(X, W, Z)$ is given by:
$$ \varphi(X, W, Z) = \text{trace}\Big(ZW^T(X^TX)WZ\Big) - 2\text{trace}\Big(ZW^T(X^TX)\Big) + \text{trace}\Big(X^TX\Big) $$
Let's differentiate each term with respect to $W$.
$$ \frac{\partial}{\partial W}\text{trace}\Big(ZW^T(X^TX)WZ\Big) = 2(X^TX)ZWZ^T $$
$$ \frac{\partial}{\partial W}\text{trace}\Big(ZW^T(X^TX)\Big) = (X^TX)Z $$
$$ \frac{\partial}{\partial W}\text{trace}\Big(X^TX\Big) = 0 $$
Now, summing up the derivatives of the three terms, we get the gradient of $\varphi$ with respect to $W$:
$$ \nabla_W \varphi(X, W, Z) = 2(X^TX)ZWZ^T - 2(X^TX)Z $$
Thus, the gradient of $\varphi$ with respect to the matrix $W$ is given by $2(X^TX)ZWZ^T - 2(X^TX)Z$. In this derivation, we used the trace properties of products of two, three, and four matrices, which are crucial for simplifying the expressions and obtaining the gradient.
The calculation of the gradient of $\nabla_{Z}\varphi(X,W,Z)$ is entirely analogous. You can do it yourself.