How to get the formula for entropy loss in linear filters in frequency domain?

303 Views Asked by At

I am trying to understand where does the expression for entropy loss in linear filters come from.

According to theorem 14 from Shannon's work "A Mathematical Theory of Communication",

If an ensemble having an entropy $H_{1}$ per degree of freedom in band $W$ is passed through a filter with characteristic $Y(f)$ the output ensemble has an entropy $$H_{2} = H_{1} + \frac{1}{W}\int\limits_{W} \log_{2} |Y(f)|^2 df$$

Let's assume that we have a linear transformation so that

$$y_{i} = \sum_{j} a_{ij} \cdot x_{j} $$

Then it can be shown that $H(y) = H(x) + \log_{2}(\det|a_{ij}|)$ (or $H_{2} = H_{1} + \log_{2}(\det|a_{ij}|)$, in the notation of the theorem)

How can I shift from that spatial representation $H_{2} = H_{1} + \log_{2}(\det|a_{ij}|)$ to the frequency domain and obtain the formula $H_{2} = H_{1} + \frac{1}{W}\int\limits_{W} \log_{2} |Y(f)|^2 df$?

Thanks!

1

There are 1 best solutions below

2
On

The formula \begin{equation} y_{i} = \sum_{j} a_{ij} \cdot x_{j} \end{equation} represents a bijective linear transformation from the coordinate system $x_1,...x_n$ to the coordinate system $y_1,...y_n$. From an electrical engineering point of view this can be seen as a linear time invariant (LTI) filter. We can use the more compact representation \begin{equation} \mathbf{y} = \mathbf{A} \mathbf{x} \end{equation} where $\mathbf{A}$ is a regular square matrix. From this point of view $\mathbf{x}$ and $\mathbf{y}$ represent the same vector but with respect to a different basis. For instance, affine transformations translate the origin of the coordinate frame, rotate the axes, and change the scale. From multivariate statistics is is possible to show that the change of coordonates implies that \begin{equation} d \mathbf{y} = |det(\mathbf{A})| d \mathbf{x} \end{equation} And \begin{equation} f_Y(\mathbf{y}) = \frac{1}{|det(\mathbf{A})|} f_X(\mathbf{x})= \frac{1}{|det(\mathbf{A})|} f_X(\mathbf{A}^{-1} \mathbf{y}) \end{equation} where x and y are the realization of continuous random variables representing the state of the system and probability density function $f_X$ and $f_Y$, respectively. The entropy of a continuous distribution with probability density function $f_Y(\mathbf{y})$ is defined as \begin{equation} H(Y) = -\int_{\infty}^{+\infty} ... \int_{\infty}^{+\infty} f_Y(\mathbf{y}) log_2(f_Y(\mathbf{y})) d\mathbf{y}= -\int_{\infty}^{+\infty}... \int_{\infty}^{+\infty} \frac{1}{|det(\mathbf{A})|} f_X(\mathbf{x}) \log_2{(\frac{1}{|det(\mathbf{A})|} f_X(\mathbf{x})}) d\mathbf{y} \end{equation} By using the above equation we can write \begin{equation} H(Y) = -\int_{\infty}^{+\infty}... \int_{\infty}^{+\infty} f_Y(\mathbf{y}) \log_2{(f_Y(\mathbf{y}))} d\mathbf{y} = -\int_{\infty}^{+\infty} ... \int_{\infty}^{+\infty} \frac{1}{|det(\mathbf{A})|} f_X(\mathbf{x}) \log_2{(\frac{1}{|det(\mathbf{A})|}f_X(\mathbf{x})})|det(\mathbf{A})| d\mathbf{x} = -\int_{\infty}^{+\infty} ... \int_{\infty}^{+\infty} f_X(\mathbf{x}) \log_2{(\frac{1}{|det(\mathbf{A})|}f_X(\mathbf{x})}) d\mathbf{x}= -\int_{\infty}^{+\infty} ... \int_{\infty}^{+\infty} f_X(\mathbf{x}) (log_2(f_X(\mathbf{x}))-\log_2{(|det(\mathbf{A})|)})d\mathbf{x}=-\int_{\infty}^{+\infty} ... \int_{\infty}^{+\infty} f_X(\mathbf{x}) \log_2(f_X(\mathbf{x})) d\mathbf{x}+\log_2{(|det(\mathbf{A})|)}\int_{\infty}^{+\infty}... \int_{\infty}^{+\infty}f_X(\mathbf{x})) d\mathbf{x} \end{equation} Since \begin{equation} \int_{\infty}^{+\infty}... \int_{\infty}^{+\infty}f_X(\mathbf{x})) d\mathbf{x}=1 \end{equation} We can write \begin{equation} H(Y) = -\int_{\infty}^{+\infty}... \int_{\infty}^{+\infty} f_Y(\mathbf{y}) log_2(f_Y(\mathbf{y})) d\mathbf{y}=-\int_{\infty}^{+\infty}... \int_{\infty}^{+\infty}f_X(\mathbf{x}) \log_2(f_X(\mathbf{x})) d\mathbf{x}+\log_2{(|det(\mathbf{A})|)}=H(X)+\log_2{(|det(\mathbf{A})|)} \end{equation} The term $\log_2{(|det(\mathbf{A})|)}$ represent the amount of entropy added by your system, i.e. the amount of information we lack about it. Note that $|det(\mathbf{A})|$ is the Jacobian of the coordonate transformation. Now for equally spaced frequency components within the bandwidth $W$, Shannon tell us that tha Jacobian of the transformation is \begin{equation} det (J) =\prod_{i=1}^{n} |Y(f_i)|^2 \end{equation} As explained in the paper "The Entropy Gain of Linear Time-Invariant Filters and Some of its Implications" from Milan S. Derpich, et al. on page 2 (look here) "Shannon proved this result by arguing that an LTI filter can be seen as a linear operator that selectively scales its input signal along infinitely many frequencies, each of them representing an orthogonal component of the source. The result is then obtained by writing down the determinant of the Jacobian of this operator as the product of the frequency response of the filter over n frequency bands, applying logarithm and then taking the limit as the number of frequency components tends to infinity."