I'm doing an essay on ICA (independent component analysis), and I could use some help.
In essence, ICA is an algorithm that minimizes the entropy of $n$ $1$-dimensional random variables, but to show this, I need to prove a lemma which I can't seem to prove:
if $\displaystyle H(y_1,y_2,\ldots,y_n)=-\int P(y_1,y_2,\ldots,y_n)\log(P(y_1,y_2,\ldots,y_n)) \, dy$
(where $P(y_i)$ is the density function of random variable $y_i$ and $y$ is the vector of these random variables)
and if $W$ is some invertible matrix, then $H(Wx)=H(x)+\log|\det W|$.
I got this exercise from the youtube lecture on the topic https://www.youtube.com/watch?v=smibJH-0YGc at around 36:50.
I would greatly appreciate help proving this lemma. The main obscurity I have is that the limits are unspecified.
You can find this classical result, e.g., in Elements of Information Theory by Cover and Thomas (Corollary of Theorem 8.6.4).
Consider a random variable $X \in \mathbb{R}^n$ with density $p_X(x)$, and define another random variable $Y := WX \in \mathbb{R}^n$. Their densities are related by $p_Y(y) = \frac{p_X(x)}{|\det J(x)|}$, where $J(x)$ is the Jacobian of the transformation $x \mapsto y = Wx$, which in this case is $J(x)=W$. Therefore the entropy of $Y$ can be written as
$$ \begin{align*} h(Y) &= -\int p_Y(y) \log p_Y(y)~\mathrm{d}y\\ &= -\int \frac{p_X(W^{-1}y)}{|\det W|} \log \frac{p_X(W^{-1}y)}{|\det W|}~\mathrm{d}y\\ &= -\int |\det W| \frac{p_X(x)}{|\det W|} \log\frac{p_X(x)}{|\det W|} ~\mathrm{d}x\\ &= - \int p_X(x) \left(\log (p_X(x) - \log|\det W|\right)~\mathrm{d}x\\ &= - \int p_X(x) \log (p_X(x)~\mathrm{d}x + \int p_X(x) \log|\det W|~\mathrm{d}x\\ &= h(X) + \log|\det W|. \end{align*} $$
By the way, this notion of entropy, which applies to continuous random variables, is called differential entropy and usually denoted $h$ to distinguish from the entropy of discrete random variables, which has slightly different properties.