I would like to know how I can calculate the gradient of null_loss(log_softmax(A*A*X*W0*W1)) w.r.t A.
All A, X, W0, W1 are 2D matrices.
Even just showing how to calculate the gradient of A*A*X*W0*W1 would also be helpful.
I'm trying to implement a function in pytorch so if you can show how you do it on pytorch, that would be awesome.
Thanks!
null_loss(log_softmax())is same as cross_entropy and the description for cross_entropy can be found here pytorch.org/docs/master/generated/…But I'm still not sure how to get the gradient w.r.t A