Let $\eta, \delta, v\in \mathbb{R}^N$. I am trying to optimize
$$\max_{\eta} \eta^\top \delta\quad\text{s.t.}\quad \eta_i\in [a_i, b_i] \,\,\wedge\,\,v^\top\eta = 0$$
quickly (ideally < 1s) in high-dimensions (N > 1e6). One brute-force solution would be to use specialised interior point optimisers (e.g. IPOPT or GUROBI), but such packages are hard to distribute in an open-source package to a wider user base (installation can be challenging). Alternatives such as CVXOPT or Scipy fail miserably according to some initial experiments. Another option would be to perform a projected gradient descent, but this adds additional hyperparameters and can take time to converge.
I'd very much appreciate a more clever (and efficient) solution, if any exists.
Yes, you can solve this efficiently (linear in the number of variables). For simplicity, I'll just outline an algorithm and I'll switch notations to something more familiar to me. First, note we can shift and rescale variables so that they're all constrained to the unit interval: $$ \begin{gather*} \min_x c^T x \\ 0 \le x \le 1 \\ a^T x = b \end{gather*} $$ The corresponding dual problem is: $$ \begin{gather*} \max_{\nu,\lambda_1,\lambda_2} -1^T \lambda_2 - b \nu \\ \lambda_1 \ge 0, \lambda_2 \ge 0 \\ \lambda_1 - \lambda_2 =c + \nu a \end{gather*} $$ The solution to the primal can be recovered from the dual since the KKT conditions imply: $$ x^*_i = \begin{cases} 0 & c_i + \nu^* a_i > 0 \\ 1 & c_i + \nu^* a_i < 0 \end{cases} $$ The remaining coordinates where $c_i + \nu^* a_i = 0$ can then be chosen to be any primal feasible values.
To solve the dual problem, we can optimize out $\lambda_1$ and $\lambda_2$, resulting in a 1-dimensional problem. $$ \max_{\nu} g(\nu) := -b \nu + \sum_i \min(0, c_i + \nu a_i) $$ This function $g$ is piecewise linear with breakpoints $-c_i/a_i$. So assuming the problem is primal feasible and the dual is bounded, the optimium must occur at one of the breakpoints. So we can reduce the problem to finding the maximum over a finite set of values: $$ j^* = {\mathop{\operatorname{arg max}}_j} g\left(-c_j/a_j\right) \Rightarrow \nu^* = -c_{j^*}/a_{j^*} $$ Naively evaluating $g$ at each of the breakpoints will take $O(n^2)$ time since there are $n$ breakpoints and evaluating $g$ takes $O(n)$ time. Instead it's better to sort the breakpoints and keep track of partial sums to reduce the cost of evaluating $g$ to $O(1)$. This results in an $O(n \log n)$ algorithm due to the sort. Furthermore, if you want to get fancy, you can do a variation of quickselect on the breakpoints to get an $O(n)$ algorithm.
Addendum: At the end of the day you should get an algorithm like this (using your notation). Initially set $\eta_i = \begin{cases} a_i & \delta_i < 0 \\ b_i & \delta_i > 0 \end{cases}$. This is the solution to the problem without the constraint. Then: