Confused about a proof involving concave functions

83 Views Asked by At

This proof is from a book about linear programming; it is about interior point methods, but I think that might not be important for my question.

We have:

$$f_{\mu}(\mathbf{x}) = \mathbf{c}^T\mathbf{x} + \mu\sum_{i=1}^{m}\log(b_{i} - \mathbf{a}_{i}\cdot\mathbf{x}) $$

For vectors $\mathbf{c},\mathbf{x},\mathbf{a}_i$ for $i = 1,2,3,...,m$ and $\mu > 0$. The book claims that $f_{\mu}$ has a unique maximum inside the polytope defined by $A\mathbf{x} < \mathbf{b}$ where the $\mathbf{a_i}$'s are the rows (not the columns) of $A$; we also assume that this polytope is bounded. $A$ is an $m \times n$ real matrix of rank $m$; so the polytope is in $\mathbb{R}^n$.

I don't understand a line in the proof of the uniqueness claim. The books states that if there were two different maxima $\mathbf{x}$ and $\mathbf{y}$ then $f_{\mu}$ must be constant along the line segment $\mathbf{xy}$ because the polytope is convex so all of the line segment is inside the polytope and $f_{\mu}$ is concave.

The book then states that "since the logarithm is strictly concave, this can happen only if $A\mathbf{x} = A\mathbf{y}.$" This is the part I don't understand; why does the logarithm being concave imply that $A\mathbf{x} = A\mathbf{y}$? I don't have much experience with convexity.

2

There are 2 best solutions below

4
On BEST ANSWER

Suppose $a_i \cdot x \neq a_i \cdot y$ for some $i$, then you get a contradiction that $x$ and $y$ are global minima: \begin{align} f_\mu(0.5x + 0.5y) &= c^T(0.5x+0.5y)+\mu \sum_{i=1}^m \log(b_i-a_i\cdot(0.5x+0.5y)) \\ &= c^T(0.5x+0.5y)+\mu \sum_{i=1}^m \log(0.5(b_i-a_i\cdot x) + 0.5(b_i - a_i\cdot y)) \\ &> 0.5c^Tx+0.5c^Ty+0.5\mu\sum_{i=1}^m \log(b_i-a_i\cdot x)+0.5\mu\sum_{i=1}^m \log(b_i-a_i\cdot y) \\ & = 0.5f_\mu(x) + 0.5 f_\mu(y) \\ & = \max_x f_\mu(x). \end{align} The strict inequality comes from the strict concavity of the logarithm: if $b_i-a_i\cdot x \neq b_i-a_i\cdot y$, then the midpoint has a lower function value.

2
On

$A:= \begin{bmatrix} \mathbf a_1^T\\ \vdots \\ \mathbf a_m^T\\ \end{bmatrix}$

Assuming a maximum exists, I'll give the uniqueness proof. But it is prudent to exploit linearity and consider a simpler function, with a change of variables

in essence define $g = f -\mathbf c^T\mathbf x$ and assume $\mu: = 1$ for now, then simplify notation with
$\mathbf z:=A\mathbf x $ and
$ \mathbf z':=A\mathbf x'$

$g\big(\mathbf z\big) := \sum_{i=1}^m \log\big(b_i - z_i\big)$
$g$ is a strictly negative convex function (you can show this e.g. by differentiating twice and seeing the Hessian is a diagonal matrix with negative entries on the diagonal). By the definition of strict negative convexity (aka strict concavity)

for any $p\in (0,1)$
$p\cdot g\big(\mathbf z\big)+(1-p) \cdot g\big(\mathbf z'\big)\leq g\big(p\mathbf z+(1-p)\mathbf z'\big)$
with equality iff $\mathbf z = \mathbf z'$
in other words, if two distinct points $\mathbf z$ and $\mathbf z'$ attain a maximum under $g$, then there is an even bigger value in their convex hull-- a contradiction. (It is useful to recall that a convex hull of finitely many points is a polytope, so considering convex hulls is quite natural.) This is essentially the proof. Everything that follows is just book-keeping to show that the above implies the result for the original problem statement.

This is still true for any $\mu \gt 0$, i.e. rescale each side by $\mu$
$\mu\cdot p\cdot g\big(\mathbf z\big)+\mu\cdot (1-p)\cdot g\big(\mathbf z'\big)\leq \mu\cdot g\big(p\mathbf z+(1-p)\mathbf z'\big)$

this implies the function given by
$h_\mu\big(\mathbf x\big) := f_\mu\big(\mathbf x\big) - \mathbf c^T\mathbf x$
cannot have two distinct maximizing x's, $\mathbf x$ and $\mathbf x'$ unless $\mathbf z=\mathbf z'$ i.e. unless $\mathbf x$ and $\mathbf x'$ are equal under the image of $A$, so putting it all together

$p\cdot f_\mu\big(\mathbf x\big) +(1-p) \cdot f_\mu\big(\mathbf x'\big) -\mathbf c^T \big(p\mathbf x+(1-p)\mathbf x'\big) $
$=p\cdot \big\{f_\mu\big(\mathbf x\big) - \mathbf c^T (\mathbf x)\big\}+(1-p)\big\{f_\mu\big(\mathbf x'\big) - \mathbf c^T (\mathbf x')\big\}$
$=p\cdot h_\mu\big(\mathbf x\big)+(1-p)\cdot h_\mu\big(\mathbf x'\big)$
$\leq h_\mu\big(p\mathbf x+(1-p)\mathbf x'\big)$
$= f_\mu\big(p\mathbf x+(1-p)\mathbf x'\big) - \mathbf c^T \big(p\mathbf x+(1-p)\mathbf x'\big) $
with equality iff $A\mathbf x = A\mathbf x'$

this of course simplifies to
$p\cdot f_\mu\big(\mathbf x\big) +(1-p) \cdot f_\mu\big(\mathbf x\big) \leq f_\mu\big(p\mathbf x+(1-p)\mathbf x'\big) $

one awkward point:
$A \in \mathbb R^{m\times n}$ with rank $m$. If $n\neq m$ it is short and fat and hence surjective, but also has linearly dependent columns. Thus the optimizing $\mathbf x$ shouldn't be unique since multiple other choices exist for $\mathbf x'$ which take the same value under the image of $A$.