Why is this condition necessary for maximum information?

Question

Why is this condition necessary for maximum information?

40 Views Asked by Bumbble Comm At 08 Apr 2026 - 12:41

I am reading Science and Information Theory by Brillouin and when he states that the information, defined as: $$ I = - K \sum_{j=1}^m p_j \ln p_j $$ is maximum when all probabilities are the same, i.e.: $$ p_1=p_2=...=p_m, \quad \text{where}\quad \sum_j p_j =1$$

He states 2 conditions the first one is:

First-order partial derivates are zero

Which I completely understand.

However, I don't know why he establishes the second condition:

That $I_{11}$ and the determinants of orders 2,3,...,m-1, (obtained by adding the next row and column) have alternating sign. Where $I_{mm}$ represents the matrix of second partial derivatives, that is:

$$ I = \left ( \begin{matrix} I_{11} & I_{12} & ... & I_{1m} \\ I_{21} & \ddots & ... & I_{2m} \\ \vdots & ... & \ddots & \vdots \\ I_{m1} & I_{m2} & ... & I_{mm} \end{matrix} \right )$$

with $I_{ij}=\partial I/\partial p_i \partial p_j$

Why is this last condition needed? Isn't the first condition enough?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This has nothing to do with entropies, it's a general property of multivariate functions.

Consider a scalar real differentiable function, $f(x)$: it has a local minimun at $x_0$ iff $f'(x_0)=0$ and $f''(x_0)<0$. And no, the first condition is not enough (to have zero derivative only implies it's a critical point).

This can be generalized to multivariate variables. Now we must have null gradient (first-order partial derivates are zero) and the Hessian (matrix of second derivatives) is negative definite matrix.

One of the criterion for a matrix to be negative definite is the given in your texbook: the leading principal minors must alternate signs - see for example here, theorem 5.

All this said, it seems a clumsy way to prove the desired property. For one thing, you need to prove it's a global maximum (not merely a local one). Second, you need to take into account the additional constraint $\sum p_i = 1$. The standard proof using Jensen inequality looks much more elegant to me.

Why is this condition necessary for maximum information?

There are 1 best solutions below

Related Questions in MAXIMA-MINIMA

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions