What's the maximum entropy subject to linear constraints?

181 Views Asked by At

I'm trying to maximize the function $$ H(\mathbf{p}) = \sum_j p_j \lg \left(\frac{1}{p_j}\right)\qquad(\lg = \log_2) $$ subject to $$ \mathbf{Ap=b}\quad\text{and}\quad\mathbf{p\geq 0}\quad\text{and}\quad\sum_j p_j = 1 $$ where $\mathbf{A}$ usually has a non-trivial nullspace. If we add the last constraint to the system of linear equations $\mathbf{Ap=b}$ then using Lagrange multipliers we get $$ L(\mathbf{p,\lambda}) = H(\mathbf{p}) + \lambda_1\left(\mathbf{a}_1^T\mathbf{p}-b_1\right) + \lambda_2\left(\mathbf{a}_2^T\mathbf{p}-b_2\right)+\dots $$ where $\mathbf{a}_i$ is the $i$th row of $\mathbf{A}$. Thus, $$ \begin{split} L_{p_j} & = \lg\left(\frac{1}{e\cdot p_j}\right) + \lambda_1 a_{1,j} + \lambda_2 a_{2,j} + \dots\\ & = \lg\left(\frac{1}{e\cdot p_j}\right) + \mathbf{a}_j^T\mathbf{\lambda}\qquad\text{($\mathbf{a}_j$ is the $j$th column of $\mathbf{A}$)}\\ L_{\lambda_i} & = \mathbf{a}_i^T\mathbf{p}-b_i \end{split} $$ Setting all the $L_{p_j}$ to zero we arrive at $$ p_j = \frac{2^{\mathbf{a}_j^T\mathbf{\lambda}}}{e} \quad\text{or}\quad \mathbf{A}^T\mathbf{\lambda} = \lg(e\cdot \mathbf{p}) $$ where the $\lg$ is element wise. And setting the $L_{\lambda_i}$ to zero gives $$ \mathbf{Ap=b}\,. $$ But at this point I don't know whether there is any way to continue without additional information about $\mathbf{A}$ or $\mathbf{b}$. Is this as good as it gets?