Let $w$ be some vector. I want to solve: $$ w=\arg\min \sum f_i^2(w) \\s.t. w\geq 0, \sum w_i = 1$$ That can be formulated as follows: $$ w=\arg\min ||{\bf f}(w)||^2 \\ s.t. \quad H(w)\geq 0, G(w)= 0 $$ where $H(w)=w,G(w)={\bf 1}^T w-1=\sum w_i-1$.
This seems to me a standard problem. The unconstrained problem can be solved using the Gauss-Newton (GN) method. However, I can't seem to find how to solve the constrained problem. Which method should I use? can GN be adapted to these constraints?
Thank you.
EDIT:
a. I will add a formula for $f_i(w)$: $$ f_i(w) = w^T A_i w - z_i$$ where $A_i$ is a symmetric matrix and $z_i$ are known scalars.
b. Trying to find my answer, I encountered the Levenberg–Marquardt algorithm, which appears almost suitable here, but I do have a matrix $A_i$ and not a scalar that varies with $i$.
c. I thought of doing the following: since in each iteration of solving the Gauss-Newton method I solve a linear LS, I would solve instead a constrained LS with my constraints. Is that a right way to go?
Thanks again.
Your optimization problem has the form \begin{align} \tag{$\spadesuit$}\operatorname{minimize}_w &\quad F(w) \\ \text{subject to} & \quad w \in C \end{align} where $$ F(w) = \sum_i f_i(w)^2 $$ and $C$ is the probability simplex (which is a closed, convex set).
You can solve problems of the form ($\spadesuit$) using the projected gradient method: $$ w^{k+1} = \Pi_C(w^k - t\nabla F(w^k)). $$ Here $\Pi_C(y)$ is the projection of $y$ onto $C$. Projecting onto the probability simplex is a very well studied operation and can be performed very efficiently with existing algorithms. (It takes only a few lines of Matlab. See algorithm 1 here, for example: https://arxiv.org/abs/1309.1541 .)
The projected gradient method has a step size restriction, so be sure to select the step size $t$ small enough so that the step size restriction is satisfied. There is also a line search procedure you can use to select step sizes(which I think is probably a good idea).
You could also try using an accelerated version of the projected gradient method, such as FISTA, although I'm not sure how they will perform for nonconvex problems.
Edit: The projected gradient method is a special case of the proximal gradient method, and a line search version of the proximal gradient method is described in equations 1.26 and 1.27 (p. 18) of the paper "Gradient-Based Algorithms with Applications to Signal Recovery Problems" by Beck and Teboulle. This line search version of the proximal gradient method is also described, with slightly different notation, on slide 6-20 of Vandenberghe's 236c notes: http://www.seas.ucla.edu/~vandenbe/236C/lectures/proxgrad.pdf.
For convenience, I'll write the proximal gradient method with line search in detail, using notation that I think is a little more clear than the notation in the above references. This algorithm minimizes $f(x) + g(x)$, where $f$ is assumed to be differentiable (with a Lipschitz continuous gradient) and $g$ is a convex, lower semicontinuous function. The proximal operator of $g$ with parameter $t > 0$ is denoted by $\operatorname{prox}_{tg}$. If you don't know what the proximal operator is, that's fine, just replace $\operatorname{prox}_{tg}$ with $\Pi_C$, and you have the projected gradient method. Here is the algorithm: \begin{align} &\text{Initialize } x_0 \text{ and select } t_0 > 0, 0 < \alpha < 1, \beta > 1\\ &\textbf{for } k = 1,2,\ldots \textbf{do:}\\ & \quad t := \beta \, t_{k-1} \\ &\quad \textbf{repeat:} \\ &\qquad \quad x := \operatorname{prox}_{tg}(x_{k-1} - t \nabla f(x_{k-1})) \\ &\qquad \quad \textbf{break if } f(x) \leq f(x_{k-1}) + \langle \nabla f(x_{k-1}),x - x_{k-1} \rangle + \frac{1}{2t} \| x - x_{k-1} \|_2^2 \\ &\qquad \quad t := \alpha t \\ & \quad t_k := t \\ & \quad x_k := x\\ &\textbf{end for} \end{align}
I usually select $\alpha = .5$ and $\beta = 1.25$. I think $t_0 = 1$ is usually not a bad choice. You could initialize $x_0$ to be your best available guess of the optimal value of $x$, or perhaps just initialize $x_0$ randomly or to all zeros.
Here is some python code for the proximal gradient method with line search:
Here's a little more info about the relationship between the proximal gradient method and the projected gradient method. The projected gradient method for minimizing $f(x)$ subject to the constraint that $x \in C$, is just the special case of the proximal gradient method where $g$ is the convex indicator function of $C$: $$ g(x) = \begin{cases} 0 & \quad \text{if } x \in C \\ \infty & \quad \text{otherwise.} \end{cases} $$ With this choice of $g$, the proximal operator of $g$ is given by the formula $$ \operatorname{prox}_{tg}(x) = \Pi_C(x) = \text{projection of $x$ onto $C$}. $$ (I'm assuming that $C$ is a closed convex set.)