The following is taken from Leon Simon Geometric Measure Theory: Let $f: M \to \mathbb{R}^P$ for $P \geq n$. Where $M$ is an $n$ dimensional smooth submanifold of $\mathbb{R}^{n+l}$, and $f$ is locally Lipschitz.
For a tangent $\tau \in T_y M$, we define the directional derivative $D_\tau f \in \mathbb{R}^P$ by
$$ D_\tau f = \frac{d}{dt} f(\gamma(t))|_{t=0}$$
$\gamma: (-1,1) \to M$ is a $C^1$ curve, with $\gamma(0) = y$ and $\dot{\gamma}(0) = \tau$.
We go to local coordinates. Fix $y \in M$. Let $$\varphi : U \cap M \to V $$ where $U \in \mathbb{R}^{n+l}$ and $V \in \mathbb{R}^n$. Take the inverse $$ \psi:V \to M \cap U.$$
Apply Radamacher's Theorem (which says a Lipschitz function is differentiable almost everywhere) to $f \circ \psi: V \subset \mathbb{R^n} \to \mathbb{R}^P$, so that there is $E_0 \in V$ with $H^n(E_0) = 0$ and $f \circ \psi$ is differentiable in $V \setminus E_0$. So for any $\eta \in \mathbb{R}^n$, $x \in V \setminus E_0$, $$ D_\eta(f \circ \psi)(x) = \frac{d}{dt} f ( \psi(x + t \eta))|_{t=0} $$
exists and is linear in $\eta$. But $\psi(x + t \eta)$ is a curve as in the definition above, so letting (by chain rule) $\tau = \dot{\psi} = \sum_{j=1}^n \eta_j D_j \psi(x)$, we have that
$$ \tag{*}D_\tau f(\psi(x))) = D_\eta(f \circ \psi )(x) $$
exists for all $\psi(x) \in U\cap M \setminus \psi(E_0)$. $H^n(\psi(E_0))= 0$ because $\psi $ is locally Lipschitz on $V$.
Letting $\eta = e_i$, so that the curve is $\psi(x + te_i)$ in $(*)$ and $\tau_1, ... \tau_n$ an orthonormal basis for $T_y M$, we have that $\tau = D_i \psi(x)$, and
$$ D_i \psi(x) = \sum_{l=1}^n (D_i \psi(x) \cdot \tau_j) \tau_l $$
$$ D_i(f \circ \psi) (x) = \sum_{k=1}^n D_{\tau_k} f(y)( D_i \psi(x) \cdot \tau_k ) $$
$$\tag{**} D_i(f \circ \psi) \cdot D_j(f \circ \psi) = \sum_{k,m = 1}^n (D_i \psi \cdot \tau_k)(D_j \psi \cdot \tau_m) D_{\tau_k}f(y) \cdot D_{\tau_m} f(y) $$
I understand so far. I dont get how the next few lines follow from the previous computations:
Since $\det AB = \det A \det B$ for square matrices $A,B$ and
$$ \sum_{k=1}^n D_i \psi(x) \cdot \tau_k D_j \psi(x) \cdot \tau_k = D_i \psi(x) \cdot D_j \psi(x) $$
this implies
$$ J_{f \circ \psi} (x) = J_\psi(x) J_f(y) .$$
where $J_{f \circ \psi}(x)= \sqrt{\det(D_i (f \circ \psi)(x) \cdot D_j (f \circ \psi)(x))}$, $J_\psi(x) = \sqrt{\det(D_i(\psi(x)) \cdot D_j(\psi(x))}$ and $J_f(y) = \sqrt{\det G(y)}$ where $G(y)$ is the $n \times n$ matrix with $(D_{\tau_k} f(y) \cdot D_{\tau_m} f(y))$ in the $k$th row and $m$th column. END SIMON
I understand that equation $(**)$ is supposed to somehow lead us to matrix multiplication, and then we can take determinants etc. But I really dont understand how it all fits together, how the statements lead to the final conlusion. Where does the double sum in $k,m$ go? and I dont see how the matrix multiplication comes about. Can someone spell out the details?
Let $A$ be the $P\times n$ matrix whose $j$th column is $D_j(f\circ\psi)$; let $B$ be the $P\times n$ matrix whose $j$th column is $D_{\tau_j}f$; and let $C$ be the $n\times n$ matrix whose $ij$-entry is $D_j\psi\cdot\tau_i$. Then the chain rule (the equation preceding ($**$)) tells us that $A=BC$. Equation ($**$) is computing the $ij$-entry of $A^\top A$, and the equation preceding "this implies" tells us that $\det(C^\top C) = J_\psi^2$.
Finally, we have $$J_{f\circ\psi}^2 = \det(A^\top A) = \det\big(C^\top(B^\top B)C\big) = \det(C^\top C)\det(B^\top B) = J_\psi^2 J_f^2,$$ as desired. At the penultimate step we use the fact that for $n\times n$ matrices $C$ and $D$, we have $\det(C^\top DC) = \det(DCC^\top)=\det D\det(C^\top C)$ because (several times) the determinant of a product is the product of the determinants, as Simon warned you would be necessary.