Convex Optimisation: Effectively Computing the Minimum

41 Views Asked by At

we seek to effectively compute the maximum entropy function within a given polytope of probability functions. We strongly believe that this is possible and that an argument for its feasibility is well-known. All help is much appreciated.

Let $\mathbb P$ be the set of probability functions: \begin{align*} \mathbb P:=\{x_1,\dots,x_n\;:\;\sum_{i=1}^nx_i=1\text{ and }x_i\geq 0\text{ for all }1\leq i\leq n\} \enspace. \end{align*}

The given feasible region, $\mathbb E\neq\emptyset$, is a subset of $\mathbb P$, compact and convex. It is given by $k$ constraints of the form $\sum_{i=1}^na_{ri}x_i\leq c_r$ where the $a_{ri}\in\{0,1\}$ and the $c_r\in[0,1]$ are rational numbers and $1\leq r\leq k$. So, \begin{align*} \mathbb E=\{\vec x\in\mathbb P\;:\;A\vec x\leq \vec c\}\neq\emptyset \enspace, \end{align*} where $A$ is the matrix of the $a_{ri}$. ($\mathbb E$ is a polytope.)

We aim to minimise the negative entropy on $\mathbb E$, where the negative entropy is defined by \begin{align*} H(\vec x):=\sum_{i=1}^n x_i\cdot\log(x_i) \enspace. \end{align*} The convex optimisation problem is thus \begin{align*} \textbf{minimise }&H(\vec x)\\ \textbf{subject to }&\vec x\in\mathbb E \enspace. \end{align*} Minimisation problems over a convex feasible region (here $\mathbb E$) of a convex function (here $H$) have a unique minimum. Let us call this minimum $\vec x^*$. We are looking for an effectively computable algorithm $\mathcal A$ that finds probability functions that are arbitrarily close to $\vec x^*$. That is, given any $\epsilon>0$ we want $\mathcal A$ to terminate after outputting a probability function $\vec y$ such that $||\vec y-\vec x^*||\leq \epsilon$.

Our problem with standard optimisation approaches is that we want the algorithm $\mathcal A$ to be effectively computable; i.e., implementable on a Turing machine. There are a number of standard optimisation approaches (Newton method, Hill-Climbing, gradient descent, [relative] entropy programs, interior-point methods, iterative scaling, KKT, ...), which are guaranteed to produce sequences of probability functions $(\vec x_l)_{l\in\mathbb N}$ which converge to $\vec x^*$ (some also come with a guarantee on the rate of convergence). However, all the algorithms we found employ exact values of (the derivative[s] of) the objective function $H$. $H$ (nor its derivatives) is not effectively computable. [KKT requires the solution of systems of equations with exponentiatials.]

What we are looking for is an algorithm $\mathcal B$ which produces a sequence of probability functions $(\vec x_l)_{l\in\mathbb N}$ such that

  1. the sequence $(\vec x_l)_{l\in\mathbb N}$ converges to $\vec x^*$,
  2. for all $\epsilon>0$ there exists some $L$, which depends on $\epsilon$, such that $||x_L-\vec x^*||\leq \epsilon\;\;\;$
  3. $L$ is effectively computable from the inputs $\mathbb E$ and $H$ and
  4. $\mathcal B$ only employs terms that are effectively computable from the inputs $\mathbb E$ and $H$.

We expect but don't know(!) that a suitable modification of a convex optimisation algorithm with known rate of convergence will do the trick. Our guess is that replacing the objective function and its derivatives by their finite approximations (which improve suitably in accuracy with every step) will do the trick.

We know that if the uniform probability function is in $\mathbb E$ then it's the solution our optimisation problem. Whether this function is in $\mathbb E$ is decidable. So all is good in this case.

We tried to replace $\mathbb E$ by a multi-dimensional grid, let's call this $\mathbb E^+$. Since $\mathbb E$ is compact, this grid has finitely many points. We can effectively calculate the negative entropy of all points in $\mathbb E^+$ up to any given degree of precision. Given the concavity of $H$, we can determine a small (the exact number depends on $n$) set of points in the grid, which all have the same leading digits of entropy and their entropy is minimal in the grid $\mathbb E^+$. We did not manage to show that these points are sufficiently close to $\vec x^*$. To be sure, their entropies are close to $H(\vec x^*)$, but why are these points close to $\vec x^*$?

If no such algorithm exists, then there cannot be a guarantee that a computer algorithm addressing the convex minimisation problem will produce a probability function that is close to the optimum. In wider terms, a class of convex optimisation problems cannot be solved by computer algorithms. We cannot believe that this is the case.

Many thanks for all your help. It is much appreciated.