Is the function (sum-of-squares) / sum convex on nonnegative input?

1.2k Views Asked by At

Let $$f \colon \mathbb{R}_{> 0}^n \to \mathbb R$$ be defined by $$f(x_1,\dotsc,x_n) = \begin{cases} 0 &\text{if }x_1 = \dotsb = x_n = 0\text{,}\\ \frac{\sum_i x_i^2}{\sum_i x_i} &\text{otherwise.} \end{cases}$$ Is this function convex?


Motivation: The function I have described is equivalent to taking a weighted average of the $x_i$ where the weight of $x_i$ is proportional to $x_i$. If the $x_i$ represent the sizes of the various classes at a college, then this function $f$ computes the average class size experienced by students. (A large class will be experienced by more students and will consequently receive greater weight.)

This function gives a smooth approximation of the maximum function whose partial derivatives are relatively inexpensive to compute (only a single division is required for all the partial derivatives put together). As such, I imagine it might be useful as a pooling function for convolutional neural nets. But it would be helpful to know more about its optimization properties when running gradient descent and similar algorithms.

2

There are 2 best solutions below

2
On BEST ANSWER

Yes, it is convex.

$$ f(\lambda x+(1-\lambda) y)\le \lambda f(x) + (1-\lambda) f(y) $$

for $x,y\in \mathbb{R}^n$, $\lambda \in [0,1]\;$.

$$ \frac{\sum(\lambda x_i+(1-\lambda)y_i)^2}{\sum(\lambda x_i+(1-\lambda)y_i)}\le\lambda \frac{\sum x_i^2}{\sum x_i}+(1-\lambda) \frac{\sum y_i^2}{\sum y_i} $$ So $$ \lambda^2\sum x_i^2+(1-\lambda)^2\sum y_i^2+2\lambda(1-\lambda)\sum x_iy_i\le\\ \le\lambda\sum x_i^2\Big(\lambda+(1-\lambda)\frac{\sum y_i}{\sum x_i} \Big)+ (1-\lambda)\sum y_i^2\Big((1-\lambda)+\lambda\frac{\sum x_i}{\sum y_i} \Big) $$ And simplifying some addends

$$ 2\lambda(1-\lambda)\sum x_iy_i\le \lambda(1-\lambda)\sum x_i^2\frac{\sum y_i}{\sum x_i} +\lambda(1-\lambda)\sum y_i^2\frac{\sum x_i}{\sum y_i} $$ Here $\lambda$ disappears, and you obtain

$$ 2\sum x_i\sum y_i\sum x_iy_i\le\sum x_i^2\big(\sum y_i\big)^2+\sum y_i^2\big(\sum x_i\big)^2 $$

You can prove this using first the Cauchy-Schwarz inequality, and then AM-GM (arithmetic mean-geometric mean).

$$ 4\big(\sum x_i\big)^2\big(\sum y_i\big)^2\big(\sum x_iy_i\big)^2\le4\big(\sum x_i\big)^2\big(\sum y_i\big)^2\sum x_i^2\sum y_i^2=\\ =4\sum x_i^2\big(\sum y_i\big)^2\sum y_i^2\big(\sum x_i\big)^2\le\big[\sum x_i^2\big(\sum y_i\big)^2+\sum y_i^2\big(\sum x_i\big)^2\big]^2 $$

thus proving it is convex.

0
On

Marginally differently approach: show that the epigraph is convex.

Let $e=(1,...,1)$ and let $x \ge 0$ throughout. Then $f(x) = {\|x\|^2 \over e^T x}$. The pigeonhole principle shows that $e^Tx \ge {1 \over \sqrt{n}} \|x\|$ and so $f(x) \le \sqrt{n}\|x\|$, in particular $f$ is continuous at $x=0$. Consequently, to show that $f$ is convex, we can assume $x \neq 0$ (convexity on $x \ge 0$ follows from continuity applied to the definition of convexity).

As an aside, if $a,b \ge 0$ and $t \ge 0$, it is straightforward to show that $t a+{1 \over t} b \ge 2\sqrt{ab}$.

Suppose $(\alpha_k, x_k)$ are in the epigraph of $f$ and $\lambda \in [0,1]$, we want to show that $\lambda \alpha_1 + (1-\lambda) \alpha_2 \ge f(\lambda x_1+(1-\lambda)x_2)$. We have \begin{eqnarray} (\lambda &\alpha_1& + (1-\lambda) \alpha_2) e^T(\lambda x_1+(1-\lambda)x_2) \\ &=& \lambda^2\alpha_1e^T x_1+\lambda(1-\lambda)(\alpha_1 e^T x_2+\alpha_2 e^T x_1)+(1-\lambda) \alpha_2e^T x_2 \\ &\ge& \lambda^2 \|x_1\|^2+\lambda(1-\lambda) (\alpha_1 e^T x_1 {e^T x_2 \over e^T x_1}+\alpha_2 e^T x_2 {e^T x_1 \over e^T x_2})+(1-\lambda)^2\|x_2\|^2 \\ &\ge& \lambda^2 \|x_1\|^2+2\lambda(1-\lambda) \sqrt{\alpha_1 e^T x_1 \alpha_2 e^T x_2} +(1-\lambda)^2\|x_2\|^2 \\ &\ge& \lambda^2 \|x_1\|^2+2\lambda(1-\lambda) \|x_1\| \|x_2\| +(1-\lambda)^2\|x_2\|^2 \\ &\ge& \lambda^2 \|x_1\|^2+2\lambda(1-\lambda) x_1^T x_2 +(1-\lambda)^2\|x_2\|^2 \\ &=& \| \lambda x_1+(1-\lambda)x_2 \|^2 \end{eqnarray} Dividing across by $e^T(\lambda x_1+(1-\lambda)x_2)$ gives the desired result.