Let $X$ be a continuous random variable with probability density function $f(x)$, differential entropy can be defined somehow like $$-\sum_{i=-\infty}^{+\infty}f(x_i)\Delta x\ln \bigl(f(x_i)\Delta x\bigr)+\ln(\Delta x)\xrightarrow{\Delta x\to 0}\int_{-\infty}^{+\infty}f(x)\ln f(x)\operatorname d\!x,x_i\in\bigl[i\Delta x, (i+1)\Delta x\bigr]\text{,}$$ but actually the two sides are not the same, for $f(x)$ might be so ill-behaved that choosing $x_i$ arbitrarily in the intervals cannot make the lefthand side converge when $\Delta x\to 0$, even the righthand side converges absolutely. This result is extremely awful, which means we need to check every time when we get a continuous distrubution, and obviously we don't do that in information theory.
So I tried to introduce the concept more generally. Below are some possible ways:
- Directly define entropy as $$H(X)=E(\ln f(x))=\int_{-\infty}^{+\infty}f(x)\ln f(x)\operatorname d\!x\text,$$ then there's no concern for convergence, as $E(\ln f(x))$ is defined using Lebesgue-Stieltjes integral, but its practical significance is doubtful. Is it meaningful in cases when the intial lefthand side diverges? In addition, can we define entropy in an axiomatic way, or can we develop an 'entropy measure'?
- Replace $f(x_i)\Delta x$ with $\displaystyle\int_{x_i}^{x_{i+1}}f(x)\,\mathrm dx$, hence
$$H(X)=\lim_{\max\limits_{i\in \mathbb N}|x_{i+1}-x_{i}|\to 0}\sum_{i=-\infty}^{+\infty}\int_{x_i}^{x_{i+1}}f(x)\,\mathrm dx \ln\Biggl(\frac{\int_{x_i}^{x_{i+1}}f(x)\,\mathrm dx}{\Delta x_i}\Biggr),$$
$$\forall -\infty<\cdots<x_{-n}<\cdots<x_0<x_1<x_2<\cdots<x_n<\cdots<+\infty\text{.}$$
This definition guarantees $$H(X)=\int_{\mathbb R}f\ln f\,\mathrm dx$$ whenever the sum of $\displaystyle\int f\ln\biggl(\frac{\int f}{\Delta}\biggr)$ with a particular division on $\mathbb R$ converges.
This definition behaves well on most cases, but there're still counterexamples, at which any sum diverges to $\infty$ while $\displaystyle\int_{\mathbb R}f\ln f\,\mathrm dx$ converges absolutely. - Division defined as in 2., perform the independent repeated trial $N$ times, count frequency of X in each interval $(x_i, x_{i+1}]$ as $n_i$, $\exists M, s.t. |i|<M$ for $N$ is finite, I assume that $$\lim_{n\to\infty}\sum_{i=-M}^{M}\frac{n_i}{N}\ln\Biggl(\frac{\int_{x_i}^{x_{i+1}}f(x)\,\mathrm dx}{\Delta x_i}\Biggr)=H_\Delta(X),a.s.,$$ and $\displaystyle\lim_{||\Delta||\to 0}H_\Delta(X)=\int_{\mathbb R}f\ln f\,\mathrm dx$ even if $H(X)$ defined in 2. doesn't converge, but I cannot make sure whether it's true.
So is there a proper way to define differential entropy more reasonably?