I want an efficient algorithm for solving the following minimization:
$Argmin~_{u} \sum_i ~ (Max(0, u \cdot x_i) -r_i)^2 $
Here $u$ and $x_i$ are vectors and $r_i$ is a scalar. The dimensionality of the vectors is modest, typically around 10. The $i$ index runs over a set which could be in the millions.
I can solve this by using the convex-concave procedure and reducing it to another subproblem
$Argmin~_{u} \sum_i ~ \theta(u \cdot x_i) ~ [(u \cdot x_i)^2 - r_i (u \cdot x_i)] $
where $\theta(y)$ is the Heaviside function; 1 if y is positive and zero otherwise.
Note that, if the $\theta$ function wasn't there, it would just be a sum of quadratics which has a well-known analytic solution. These are like quadratic functions but once they hit zero, they stay zero. They are only non-zero on one side of a hyperplane. Is there a similar analytic solution to this?
Note that the function is convex so any general-purpose convex solver would work to solve it, but I'm interested in something more efficient that exploits the structure of the functions even if it requires some iteration. My intuition tells me there is some fixed point algorithm that would work or a divide and conquer approach.
Here is one possibility that I thought of but I'd be interested if there is one even better.
You can use the Alternate Direction Method of Multipliers (ADMM) in consensus form. That is, you treat the $u$ in each sum as if they are independent variables but add a constraint that they are all equal. Using ADMM you can solve each subproblem which is minimizing each term plus a circular quadratic. That has a simple analytic solution. Then you just average them and update a dual variable. This has algorithmic complexity $N_x N_p N_A$ where these are, respectively, the dimensionality of the vector space, the number of points and number of ADMM iteration required to converge within tolerance which is often around 100 or so.
You could accelerate it towards the end by assuming which quadratics will be activated (that will stop changing eventually) and then just use the quadratic minimization formula. You check that the activizations haven't changed at the solution and if not, you're done. If they have you continue with ADMM iterations.
I also wondered if just making assumptions on the activizations and iterating would be a converging algorithm. It probably is but I'm not sure and don't know it's rate of convergence even if it is.