I have the following nonlinear optimization problem: $$ \begin{align*} \text{Find } x \text{ that maximizes } & \frac{1}{\|Ax\|} (Ax)^{\top} y \\ \text{Subject to } & \sum_{i=1}^n x_i = 1 \\ & x_i \geq 0 \; \forall \: i \in \{1\dots n\} \\ \text{Where } & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \\ & A_{i,j} \geq 0 \, \forall \, i \in \{1\dots m\}, j \in \{1\dots n\} \\ & y_i \geq 0 \, \forall \, i \in \{1\dots m\} \\ & \|y\| = 1 \\ & \|a_i\| = 1 \, \forall \text{ column vectors } a_i \text{ in } A \end{align*} $$
Motivation for this optimization problem (feel free to skip): I have $n+1$ normalized, nonnegative vectors of length $m$, $\{a_1, \dots, a_d, y\}$ which represent $n+1$ different topics. In my case, the similarity between two topics is determined by their dot product. I want to create a graph with the $n$ topics arranged as the vertices of a regular $n$-gon which shows where the topic represented by vector $y$ lies in between them. One measure of this is simply to take the dot product of $y$ with every vector $a_{1:d}$ to create a vector $z=A^{\top}y$, and then divide that vector by its sum and use it as weights to interpolate between vertices of the $n$-gon, but I don't think this approach is very meaningful because it doesn't tell you the point in the span of $A$ which, when normalized, is most similar to $y$, and in practice, most points just end up pretty close to the center. That's why I restated this problem as finding a linear combination of the vectors $a_{1:n}$, which, when normalized, is most similar to $y$.
A sort of guesstimate answer would be to find the non-negative least squares solution of $Ax=y$, and that works ok in practice, but I can't prove that it's optimal.
I think I could rewrite it in terms of KKT conditions, and solve it using calculus. However, I'm not really familiar with KKT conditions, so I wouldn't know how to do that.
What algorithm or technique could I use to solve this optimization problem?
I'm willing to accept an answer that just shows how to set this up as a quadratic program.
EDIT: I believe I've reduced this question to solving a system of equations by using the Karush-Kuhn-Tucker conditions that force $x$ to be non-negative. So, any answer that just solves this system of equations should work:
\begin{align*} \text{Solve for } x, \mu \text{ satisfying } & \\ & \frac{1}{\|Ax\|}A^{\top}\Big(y - \frac{x^{\top}A^{\top} y A x}{x^{\top}A^{\top}A x} \Big) + \mu = 0 \\ & \mu_i x_i = 0 \, \forall \, i \in \{1 \dots n\} \\ & \mu_i \geq 0 \, \forall \, i \in \{1 \dots n \} \\ \text{Where } & \\ & A \in \mathbb{R}^{m \, \times \, n} \\ & x \in \mathbb{R}^n \\ & \mu \in \mathbb{R}^n \\ & y \in \mathbb{R}^m \end{align*}
Sadly, I think this is just another quadratic program except that the objective function is $0$. The only quadratic program solver I've seen for Python (which is my work environment), CVXOPT, is quite slow.
I am not sure it's possible to cast your problem as a quadratic program, but here is a formulation with a linear objective and quadratic constraints. Define $z := -Ax / \|Ax\|$. You could rewrite your problem as \begin{align*} \min_{x,z} \quad & y^T z \\ \text{s.t.} \quad & Ax + \|Ax\| z = 0 \\ & \sum_i x_i = 1, \ x \geq 0, \end{align*} where $z$ are now additional optimization variables. The first equality constraint doesn't look good and is nonsmooth if ever $x$ is in the nullspace of $A$ (your description doesn't say whether $m > n$ or $m \leq n$, and what the rank of $A$ is).
Note that $\|z\| = 1$ by construction, so we could introduce a new (scalar) optimization variable $\lambda$ that will equal $\|Ax\|$ at optimality like so: \begin{align*} \min_{x,z,\lambda} \quad & y^T z \\ \text{s.t.} \quad & Ax + \lambda z = 0 \\ & \|z\|^2 = 1 \\ & \sum_i x_i = 1, \ x \geq 0,\ \lambda \geq 0\\ . \\ \end{align*} This is a larger problem, but a friendlier one and it has linear and quadratic constraints.
Note however that it's a nonconvex problem, but so was your initial formulation. I think you can make the problem a bit more convex by changing $\|z\|^2 = 1$ to $\|z\|_2 \leq 1$ without changing the solution but I didn't spend much time trying to verify this claim.
As suggested in one of the comments, I would go with an interior-point method to solve the problem (e.g., IPOPT if you can give $A$ explicitly). That assumes that you only have one such problem to solve, and not a long sequence of similar problems. Otherwise I would try an active-set method that could benefit from warm starts, such as SNOPT (which is a commercial product).