I am trying to understand why equation (14) from of the paper from the section, Introduction to analytic results.
If
$\dot{W} = EA^{T}$ -- (12)
$\dot{A} = BE $ -- (13)
Show that $ BW + W^{T}B^{T} = AA^{T} + C$ -- (14)
Here $W, A, E, B$ are matrices represent weights of a neural network. They're assumed to be not square matrices.
Multiply equation 12 by $B$ on the left and multiply equation 13 by $A^{T}$ on the right, we have
$\dot{A}A^{T} = B\dot{W}$ -- (15)
Integrating equation 15
$\int \dot{A}A^{T} dt = \int B\dot{W}dt + C_{1}$ -- (16)
Transposing equation 16
$\int A\dot{A}^{T} dt = \int \dot{W}^{T}B^{T} dt + C^{T}_{1}$ -- (17)
Then the author goes to say
$\int \dot{A}A^{T} dt = \int A\dot{A}^{T} dt = \frac{1}{2}AA^{T} + C$ -- (18)
I understood how the author derived equation 15, 16, 17. I'm not sure how equation 18 holds. I know that
$\dot{(AA^{T})} = \dot{A}A^{T} + A\dot{A}^{T}$.
Can this relation be used to prove the equation 18?
Denote the symmetrization operator as $$\eqalign{ {\rm sym}(X) &= \tfrac{1}{2}(X+X^T) \\ }$$ Use the two known relations (yours and equation 15) to expand the following differential. $$\eqalign{ d(AA^T) &= dA\,A^T + A\,dA^T \\ &= 2\,{\rm sym}(dA\,A^T) \\ &= 2\,{\rm sym}(B\,dW) \\ }$$ Integrating both sides yields equation 14. $$\eqalign{ \int d(AA^T) &= 2\,{\rm sym}(\int B\,dW) = 2\,{\rm sym}(B\int dW) \\ AA^T &= 2\,{\rm sym}(BW) + C \;=\; BW + W^TB^T + C \\ }$$