(MATLAB/Python) Euclidean projection on the simplex: why is my code wrong?

1k Views Asked by At

I'm trying to implement the algorithm for Euclidean projection onto the probability simplex in,

https://eng.ucmerced.edu/people/wwang5/papers/SimplexProj.pdf

enter image description here which is widely cited and I presume to be correct.

However, my MATLAB code (which is a direct implementation from the pseudo-code) appears to be wrong and I have no idea how or where this happened after checking for a while.

    %preamble
    y = rand(3,1)' %input
    y_sorted = sort(y, 'descend') %sort in descending order
    x = zeros(1, length(y))'    % the projected vector
    L = -1*ones(1, length(y))' % a list of all -1s
    
    %compute largest value in the set to find rho
    G_1 = 0;
    for j = 1:1:length(y)
        G_1 = G_1 + y_sorted(j) 
        if  y_sorted(j)+1/j*(1-G_1) > 0
            L(j) = y_sorted(j)+1/j * (1 - G_1)
        end
    end
    [argvalue_L, argmax_L] = max(L);
    rho = argmax_L
    
    %calculate lambda
    G_2 = 0;
    for i = 1:1:rho
        G_2 = G_2 + y_sorted(i)
    end
    lambda = 1/rho*(1 - G_2)
    
    %compute the projection
    for i = 1:1:length(y)
        x(i) = max(y(i) + lambda, 0)
    end
    sum(x)

However, the sum is never $1$, which must mean there is an error in the code.

I found another code for the same implementation in Python

    import numpy as np

    def projection_simplex_sort(v, z=1):
        n_features = v.shape[0]
        u = np.sort(v)[::-1]
        cssv = np.cumsum(u) - z
        ind = np.arange(n_features) + 1
        cond = u - cssv / ind > 0
        rho = ind[cond][-1]
        theta = cssv[cond][-1] / float(rho)
        w = np.maximum(v - theta, 0)
        return w
    
    v = np.array([1,2,3])
    z = np.sum(v) * 0.5
    w = projection_simplex_sort(v, z)
    print(np.sum(w))

Again, the sum is not $1$. Since I didn't write it, therefore I am not confident that it is correct, but the overall structure is there and it matches. Can someone please help?

*Another strange thing I found is that none of the projection algorithm in a Github repository I found returns a vector in the simplex. The vector elements never sum up to 1.

2

There are 2 best solutions below

1
On BEST ANSWER

You're misinterpreting the maximization problem for $\rho$. Specifically, we want the highest possible $j$ for which $u_{j} + \frac{1}{j}(1-\sum_{i=1}^{j}u_{i})$ is still positive. (Instead, you found the $j$ that maximizes it, which always give $\rho=1$ when you're generating positive numbers.)

2
On

Have a look on my code at Orthogonal Projection onto the Unit Simplex.
You will find a code which implement the method above and even faster codes.