Show the function is concave function but not strictly concave

546 Views Asked by At

For an optimisation problem, it is required to prove: is the utility function $$f(\lambda X)=−\frac{1}{\alpha}\log\left(\frac{1}{n} \sum_{i=1}^n \exp\{−\alpha \lambda ^{()}\}\right),$$ strictly concave (strictly convex-up) function of the $\lambda$ on an open set?

Here $0\leq \alpha \leq 1$, $n$ is a sample size, $^{()}$'s are random samples from leaner space $X$. The notations are borrowed from (Miyahara, 2010), $g_X(\lambda):=f(\lambda X)$ is a scaled value measure and $\lambda$ is a scale parameter.

From first principles: A twice differentiable function of one variable is convex-up (strictly convex-up) on an interval if and only if its second derivative is non-negative (negative) there. In my case I should test: $f''_{\lambda \lambda}(\lambda X)<0$.

My solution is:

$$f'(\lambda X)=−\frac{1}{\alpha}\left(\log\left(\frac{1}{n}(\exp\{−\alpha \lambda ^{(1)}\}+\exp\{−\alpha \lambda ^{(2)}\}+\ldots + \exp\{−\alpha \lambda ^{(n)}\})\right)\right)'=\\ =−\frac{n}{\alpha}\cdot \frac{\left(\exp\{−\alpha \lambda ^{(1)}\}+\exp\{−\alpha \lambda ^{(2)}\}+\ldots + \exp\{−\alpha \lambda ^{(n)}\}\right)'}{\exp\{−\alpha \lambda ^{(1)}\}+\exp\{−\alpha \lambda ^{(2)}\}+\ldots + \exp\{−\alpha \lambda ^{(n)}\}}=\\ =−\frac{n}{\alpha}\cdot \frac{{\sum_{i=1}^n (\exp\{−\alpha \lambda ^{(i)}\}})'}{\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\}}=\\ =−\frac{n}{\alpha}\cdot \frac{-\alpha{\sum_{i=1}^n X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}}}{\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\}}=\\ =n\cdot\frac{{\sum_{i=1}^n X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}}}{\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\}}.$$

$$f''(\lambda X) = \left(n \cdot \frac{{\sum_{i=1}^n X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}}}{\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\}}\right)'= \left(\frac{u(\lambda)}{v(\lambda)}\right)'=\\ =n \cdot\frac{({\sum_{i=1}^n X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}})'\cdot \sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\} - \sum_{i=1}^n X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}\cdot(\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\})'}{(\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\})^2}=\\ =n \cdot \frac{{\sum_{i=1}^n (X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}})'\cdot \exp\{−\alpha \lambda ^{(i)}\} - \sum_{i=1}^n X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}\cdot(\exp\{−\alpha \lambda ^{(i)}\})'}{(\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\})^2}=\\ =n \cdot \frac{{\sum_{i=1}^n \exp\{−\alpha \lambda ^{(i)}\}((X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}})' - X^{(i)}\cdot(\exp\{−\alpha \lambda ^{(i)}\})')}{(\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\})^2}=\\ =n \cdot \frac{{\sum_{i=1}^n \exp\{−\alpha \lambda ^{(i)}\}((X^{(i)}\exp\{−\alpha \lambda ^{(i)}\}})' - X^{(i)}\cdot(\exp\{−\alpha \lambda ^{(i)}\})')}{(\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\})^2}=\\ =n \cdot \frac{{\sum_{i=1}^n X^{(i)} \cdot \exp\{−\alpha \lambda ^{(i)}\}\cdot (\color{red}{(\exp\{−\alpha \lambda X^{(i)}\})'-(\exp\{−\alpha \lambda X^{(i)}\})'})}}{(\sum_{i=1}^n\exp\{−\alpha \lambda ^{(i)}\})^2}=0. $$ I have found that $f''_{\lambda \lambda}(\lambda X)=0$ because in the numerator we have $((\exp\{−\alpha \lambda ^{(i)}\})' - (\exp\{−\alpha \lambda ^{(i)}\})')=0$

and therefore $f(\lambda X)$ is not strictly concave, it is the concave function only.

Question. I need the solution verification or critical comments.

My be this Q&A is usefull for verification.

Literature

Miyahara Y. (2010). “Risk-Sensitive Value Measure Method for Projects Evaluation,” J. Option and Strategy, 2, 185-204.

1

There are 1 best solutions below

2
On

tl; dr As noted by River Li in the comments, the claim A twice differentiable function of one variable is strictly convex on an interval if and only if its second derivative is positive there (emphasis added) is not correct. The stated condition is sufficient, but not necessary: If $f$ is twice differentiable on some interval $[a, b]$, if $f''(t) \geq 0$ for all $t$ in $[a, b]$, and if the set $Z = \{t \in [a, b] : f'(t) = 0\}$ contains no open interval, such as a finite set of points (as in the question), then $f$ is strictly convex on $[a, b]$.

(This is analogous to the fact that a differentiable function with non-negative first derivative is strictly increasing if its derivative does not vanish identically on any open interval. An example such as $f(x) = x - \sin x$ has infinitely many critical points; an example where the critical points accumulate is left as an exercise.)


The usual proof is to fix points $x < y$ in $[a, b]$ arbitrarily, and consider the function $$ g(t) = g_{x,y}(t) = f\bigl((1 - t)x + ty\bigr) - [(1 - t)f(x) + tf(y)],\quad 0 \leq t \leq 1, $$ which measures the difference between height of the graph of $f$ and the height of the secant line through $(x, f(x))$ and $(y, f(y))$. The chain rule gives $$ g''(t) = (y - x)^{2} f''\bigl((1 - t)x + ty\bigr). $$ Particularly, $g$ is twice differentiable on $[0, 1]$, $g''$ is non-negative, and $g''$ does not vanish identically on any open interval.

Lemma: The maximum of $g$ on $[0, 1]$ is $0$, and the minimum is strictly negative.

Proof: Assume contrapositively that $g$ achieves a positive value at some point $t_{0}$. The mean value theorem applied to $g$ on $[0, t_{0}]$ implies there is a $z_{1}$ with $0 < z_{1} < t_{0}$ such that $g'(z_{1}) = g(t_{0})/t_{0} > 0$; and the mean value theorem applied to $g$ on $[t_{0}, 1]$ implies there is a $z_{2}$ with $t_{0} < z_{1} < 1$ such that $g'(z_{2}) = -g(t_{0})/(1 - t_{0}) < 0$.

The mean value theorem applied to the derivative $g'$ on $[z_{1}, z_{2}]$ now implies there is a point $z$ where $g'(z) = (g'(z_{2}) - g'(z_{1}))/(z_{2} - z_{1}) < 0$, contrary to hypothesis. This completes the proof that $g$ is non-positive on $[0, 1]$.

If the minimum of $g$ on $[0, 1]$ were $0$, then $g$ would be identically $0$, and consequently $f$ would be affine, contrary to the hypothesis that $f''$ does not vanish on any open interval. This completes the proof of the lemma.

Corollary If $[s, t] \subseteq [0, 1]$ and $g(s) = g(t) = 0$, then the minimum of $g$ on $[s, t]$ is strictly negative.

The proof of the lemma goes through with obvious changes of endpoint.

Our goal is to show $g(t) < 0$ (strict inequality) for $0 < t < 1$.

Suppose contrapositively that $g(t_{0}) = 0$ for some $t_{0}$ with $0 < t_{0} < 1$. By the corollary, there exist points $s_{1}$ and $s_{2}$ such that

  • $0 < s_{1} < t_{0} < s_{2} < 1$;
  • $g(s_{1}) < 0$ and $g(s_{2}) < 0$.

Connecting the corresponding five points on the graph of $g$ gives a "w"; a mean value theorem argument similar to those above shows $g''$ is negative at some point of $(s_{1}, s_{2})$, contrary to the hypothesis on $f$.

We have shown that $g$ is strictly negative on $(0, 1)$, i.e., that the graph of $f$ lies strictly below its secant line over $(x, y)$. Since $[x, y] \subseteq [a, b]$ was arbitrary, we have established strict convexity of $f$.