Deriving the Power Spectral Density of a Maximum Entropy Process

239 Views Asked by At

In Elements of Information Theory, Chapter 12, Section 6 Burg's Theorem is derived: Presented with the first $p$ values of the autocovariance function $R(k) = E[X_i X_{i+k}]$ a stochastic process maximizing the entropy rate is constructed. In (12.55) the power spectral density of that process is presented:

$$S(\lambda) = \frac{\sigma^2}{\lvert 1 + \sum_{k=1}^p \alpha_k \exp(-i k \lambda) \rvert^2}$$

Where $\{\alpha_i\}$ may be derived by the Yule-Walker equations. Since no derivation is presented and I couldn't find a suitable way to tackle this problem: Can someone either point me to a reference where this power spectral density is derived or give me a hint on how to get started?

1

There are 1 best solutions below

0
On

The derivation is rather lengthy, and is found in many textbooks. Here's an sketch:

$$S(\omega )=\sum_{k=-\infty}^\infty R_k \, e^{-i\omega k } \tag{1}$$

relates the spectral density of an stationary signal with its second order statistics. We assume $R(k)$ is fixed for $|k|\le m$, and we want to find $R(k)$ for $|k|> m$ so that the entropy $H$ is maximized.

Now, it's known that for a given (full) second order statistics, a Gaussian process maximizes the entropy. And for such a process the entropy can be expressed as:

$$ H = \int \log S(\omega ) d\omega \tag{2}$$

Because we want to maximize with respect to the free parameters, $R(k)$ for $|k|\ge m$, we derive with respect of them and set the derivative to zero:

$$\frac{d H}{d R(k)}= \int \frac{1}{S(\omega )} e^{-i\omega k}d\omega =0 \hspace{1cm} |k|\ge m\tag{3}$$

Taking the inverse Fourier transform we can write $S^{-1}$ as $$S^{-1}(\omega )=\sum_{k=-m}^m C(k) e^{-i\omega k} \tag{4}$$ for some symmetric $C(k)$, or $$ S(\omega )=\frac{1}{\sum_{k=-m}^m C(k) e^{-i\omega k}}$$

This can be factorized (Fejer-Riesz Lemma):

$$\sum_{k=-m}^m C(k) z^{-k}=\frac{A(z)A(z^{-1})}{\sigma^2} \tag{5}$$

which implies that $S(\omega )$ is the spectral density of an AR process.

In Cover's book an alternative derivation is presented (similar to this one). There, we first see that the Gaussian process maximizes entropy for a given second order statistic (like here), and afterwards we note that, if we consider the process as a Gaussian Markov process, then, among all such processes (again, restricting to the subset that fullill the restriction of the known $R(k)$) the one with the smallest order is the one that maximizes the entropy ( basically, because it's the one that has less "conditions" over the past). But a finite order AR process is a Markov process - and for given $R(k)$ we can find the corresponding AR coefficients (and hence $A(z)$) via the Yule Waler equations, and hence the resulting spectrum.