How to minimize the $2$-norm?

2.3k Views Asked by At

I encountered the following question today: Given $n$ real numbers $r_1\le ...\le r_n.$ Can we find $k_1,...,k_n \ge 0$ real with $\sum_{i=1}^n k_i=1$ such that $$(r_1-k_1)^2+...+(r_n-k_n)^2$$ is minimal?

Is there a way to do this analytically or are there good optimization algorithms that can do this? (The problem has apparently a unique solution (compactness) and a unique-solution by convexity of the euclidean norm).

2

There are 2 best solutions below

0
On BEST ANSWER

You are projecting the point $(r_1,\ldots,r_n)$ onto the probability simplex. This projection is needed often as a substep in convex optimization algorithms, and the solution can be found for example in this paper:

"Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application" by Wang and Carreira-Perpinan.

This paper provides the following Matlab code which projects each row vector in the $N × D$ matrix $Y$ onto the probability simplex in $D$ dimensions.

function X = SimplexProj(Y)
    [N,D] = size(Y);
    X = sort(Y,2,’descend’);
    Xtmp = (cumsum(X,2)-1)*diag(sparse(1./(1:D)));
    X = max(bsxfun(@minus,Y,Xtmp(sub2ind([N,D],(1:N)’,sum(X>Xtmp,2)))),0);

See also slide 8-12 of Vandenberghe's 236c notes:

http://www.seas.ucla.edu/~vandenbe/236C/lectures/proxop.pdf

5
On

Use Lagrange multipliers: $$F(\mathbf{k},\lambda) = \|\mathbf{r}-\mathbf{k}\|_2^2 - \lambda(\sum_{i}k_i-1).$$ Then \begin{align} \frac{\partial}{\partial k_i}F(\mathbf{k},\lambda) = & 2(k_i - r_i) - \lambda,\\ \frac{\partial}{\partial \lambda}F(\mathbf{k},\lambda) = & \sum_{i}k_i - 1. \end{align} The first equation is zero if, and only if $$k_i = r_i + \frac{\lambda}{2}.$$ Then the other condition gives $$1 = \sum_{i}k_i = \sum_{i}r_i + \frac{n}{2}\lambda,$$ so that $\lambda = \frac{2\left(1-\sum_{i}r_i\right)}{n}$. Therefore, $$\mathbf{k} = \mathbf{r} + \frac{2\left(1-\sum_{i}r_i\right)}{n}\pmatrix{1\\\vdots\\1}.$$ This finds the solutions in the interior of the region where $\mathbf{k}$ is allowed to live. If you don't find any solution in the interior, then re-do all the process above on the boundary (where one or more $k_i$'s are zero).