I would like to compute the following derivative:
$\nabla_A \|A^T A - I\|_F^2$
I've gotten one step, which is to apply the chain rule:
$\nabla_A \|A^T A - I\|_F^2 = (\nabla_A (A^T A - I)) \times 2 (A^T A - I)$
Then things start to get more fuzzy. I try to apply the product rule:
$\nabla_A \|A^T A - I\|_F^2 = ((\nabla_A A^T)A + A^T(\nabla_A A)) \times 2 (A^T A - I)$
Now I'm not sure how to easily compute things like $(\nabla_A A^T)A$. It would appear, looking at the Matrix Cookbook, that the first term is a rank-4 tensor which is a sort of generalization of the identity. I'm pretty fuzzy on how I'd reason about this tensor -- e.g., what are its dimensions? What are the dimensions after right multiplication by $A$? What's the value of that product?
Can anyone offer any pointers about how to approach this problem? I've looked online for resources on tensor calculus but it involves some unfamiliar notation (which may be inescapable). Or is there a simpler way than what I'm doing?
Writing the function in terms of the Frobenius product and taking the differential yields $$ \eqalign { f &= (A^TA-I):(A^TA-I) \cr df &= 2\,(A^TA-I):d(A^TA-I) \cr &= 2\,(A^TA-I):(dA^T\,A + A^T\,dA) \cr &= 2\,(A^TA-I):2\,{\rm sym}(A^T\,dA) \cr &= 4\,{\rm sym}(A^TA-I):A^T\,dA \cr &= 4\,(A^TA-I):A^T\,dA \cr &= 4\,A(A^TA-I):dA \cr } $$ Since $df = \frac {\partial f} {\partial A}:dA\,\,$ you can identify the derivative from the last line as $$ \eqalign { \frac {\partial f} {\partial A} &= 4\,A(A^TA-I) \cr &= 4\,(AA^T-I)A \cr } $$ In most cases like this, it is better to work with differentials than to try and use the chain rule. The reason, as you noted in your question, is the appearance of higher-order tensors, which are difficult to work with in matrix notation. Whereas the differential of a matrix is just another matrix.
If you dislike the Frobenius product, you can rework the derivation in terms of traces, e.g. $A:\!B = {\rm tr}(A^TB)$