Refer http://www.deeplearningbook.org/contents/mlp.html Page 207
Usually we do not apply the back-propagation algorithm merely to vectors, but rather to tensors of arbitrary dimensionality. Conceptually, this is exactly the same as back-propagation with vectors. The only difference is how the numbers are arranged in a grid to form a tensor. We could imagine flattening each tensor into a vector before we run back-propagation, computing a vector-valued gradient, and then reshaping the gradient back into a tensor. In this rearranged view, back-propagation is still just multiplying Jacobians by gradients.
I don't understand the above statement from a book.
- What does it mean by flattening each tensor to a vector, why would you do that? What is a tensor here,- a vector, matrix or more than 2 dimensions, and how does that affect the process of flattening?
- What does the last statement mean?