I am working in a binary optimization problem, but I use soft-labels to find an approximated solution. So the optimization problem with soft-labels is the next one:
$$\begin{aligned}& \operatorname*{argmin}_{\mathbf{y}_u \in [0,1]^{U}} & \frac{1}{L} \left\| \mathbf{X} \left( \mathbf{X}_{e}^{T} \mathbf{X}_{e} \right)^{-1} \mathbf{X}_{e}^{T} \begin{bmatrix} \mathbf{y} \\ \mathbf{y}_u \end{bmatrix} - \mathbf{y} \right\|_2^2 \end{aligned}$$ where $\mathbf{X} \in \mathbb{R}^{L \times d}$, $\mathbf{X_e} \in \mathbb{R}^{(L+U) \times d}$, $\mathbf{y} \in \mathbb{R}^{L \times 1}$ and $\mathbf{y}_u \in \mathbb{R}^{U \times 1}$.
This problem is a convex optimization problem that can be written in a standard quadratic programing form and solved using a gradient descend procedure.
Once I have the soft-labeled solution $\mathbf{y}_u = \left[ y_1, \dots,y_U \right]$, I get the binary solution labeling as $0$ the $y_i<0.5$ and as $1$ the $y_i>0.5$ for each $i=1,\dots,U$.
Now, I am working in a two new cases of this problem where I have more constraints and the thing is that I don't know how to keep the problem as a convex optimization problem. Specially, I have had problems because in the two situations I have been forced to introduce binary variables but I want to avoid that if it is possible.
First case
In this case, we have $b$ independent bags and we know the proportions of zeros and ones in each bag. So for example, lets imagine the simplest case were I have only one bag and I know the proportion of zeros and ones is $1/3$ and $2/3$ respectively. The only way to add this restriction that I have found is to introduce the binary variables $w_i$ and $z_i$ and add the next constraints:
$$\sum_{i=1}^{U}w_i=\frac{2}{3}U$$
$$\sum_{i=1}^{U}z_i=\frac{1}{3}U$$
$$w_i+z_i=1$$
$$w_i-2y_i \leq 0$$
$$z_i-2(1-y_i) \leq 0$$
$$w_i,z_i \in \lbrace0,1\rbrace$$
Like $U$ is supposed to be a large number, I can admit to not fulfill exactly the proportions with some tolerance, but I want the problem to be a convex optimization problem.
Second case
In this other case, we have again $b$ independent bags and we know that in each bag all the $y_i$ have the same value (0 or 1) but we don't know the value. To illustrate, suppose again the simplest case where we only have one bag. I have tried using a penalty method that lead me to this optimization problem:
$$\begin{aligned}& \operatorname*{argmin}_{\mathbf{y}_u \in [0,1]^{U}} & \frac{1}{L} \left\| \mathbf{X} \left( \mathbf{X}_{e}^{T} \mathbf{X}_{e} \right)^{-1} \mathbf{X}_{e}^{T} \begin{bmatrix} \mathbf{y} \\ \mathbf{y}_u \end{bmatrix} - \mathbf{y} \right\|_2^2 \end{aligned} + \epsilon \, \xi$$
$$y_i \leq \left( 0.5 +\epsilon \right) \text{ for all } i=1,\dots,U$$
$$y_i \geq \left( 0.5 - \xi \right) \text{ for all } i=1,\dots,U$$
$$\epsilon + \xi =0.5$$
$$\epsilon, \xi \geq 0$$
In this case I think the problem is solved because the term $\epsilon \, \xi \rightarrow 0$ and for this $\epsilon \rightarrow 0$ while $\xi \rightarrow 0.5$ or $\xi \rightarrow 0$ while $\epsilon \rightarrow 0.5$
The constraints in both cases are inherently non-convex, so it's unlikely that there's a convex formulation. In fact, though I believe your formulation in the second case to be conceptually correct, it is no longer convex. For an indication of why, you can investigate the Hessian of $f(x,\epsilon,\xi) = x^2 +\epsilon\xi$, which is a simpler objective in the same form as yours. The Hessian is not positive semi-definite.
To see the non-convexity of the constraints, take the simple case of U = 2 and, for the first case, a proportion of 1/2. The region in the plane which satisfies the proportion constraint consists of the union of the sets $[0,1/2]\times [1/2,1]$ and $[1/2,1]\times [0,1/2]$, shown here. This union is clearly non-convex. For the second type of constraints and U=2, the feasible region is a similar union, pictured here.
If you have to resort to binary variables, the cleanest way may be to make the $y_i$ themselves binary. Then the first constraint can be written as
$$ \sum_i y_i \leq \frac{2}{3}U +\epsilon $$
$$ \sum_i y_i \geq \frac{2}{3}U -\epsilon $$
for some small $\epsilon$. The second case can be modeled with
$$ y_1 = y_i $$ for all $i$.
CPLEX, Gurobi, and others are getting quite good at convex quadratic mixed integer optimization, though if you're in the big data regime the size of your problem may still be prohibitive.