Suppose I'm given some bunch(say M) of matrices pairs (X,Y) where each is a square (n,n) matrix . I wonder if gradient descent algorithm can be applicable to minimize following MSE: $\sum_{i=1}^{M}(Y_i - C\dot X_i)^2$, namely find such C that minimize MSE above? I saw this question, but I want to understand if it is conceptually possible for matrices and especially in my setting above, where I need to find some specific matrix C? Put it simple: suppose that I want to use linear regression framework, but instead of scalar outcome I have matrix outcome? Does it have sense or this is complete nonsense?
Thanks in advance.