Given matrix $\mathbf{M} \in \mathbb{R}^{p \times p}$ and vector $\boldsymbol{\mu} \in \mathbb{R}^{p}$, I want to solve
$$\begin{array}{ll} \underset{\boldsymbol{\beta} \in \mathbb{R}^p}{\text{minimize}} & \boldsymbol{\beta}^T \mathbf{M} \, \boldsymbol{\beta}\\ \text{subject to} & \boldsymbol{\mu}^T \boldsymbol{\beta} \geq \sigma\\ & \boldsymbol{1}^T \boldsymbol{\beta} \leq 1\\ & \beta_i \geq 0, \quad \forall i \in \{ 1,\, 2,\, \dots, \, p\} \end{array}$$
where $\boldsymbol{1} \in\mathbb{R}^p$ is vector of ones.
If we only have the first and second inequality constraints, a closed form solution for $\boldsymbol{\beta}$ can be easily calculated with the Lagrangian method, e.g.,
$$L = \boldsymbol{\beta}^T \, \mathbf{M}\boldsymbol{\beta} + \lambda_1 (\boldsymbol{\mu}^T \boldsymbol{\beta} - \sigma) + \lambda_2 (1 - \boldsymbol{1}^T \boldsymbol{\beta})$$
and then we calculate $\frac{\partial L}{\partial \boldsymbol{\beta}} = 0$, $\frac{\partial L}{\partial \lambda_1} = 0$, and $\frac{\partial L}{\partial \lambda_2} = 0$.
But now how can we deal with the non-negativity constraint and how can the Lagrangian be written? Do you think I have to solve the optimization problem with only the first two inequality constraints, and then after finding
$$\boldsymbol{\beta}^* = \begin{bmatrix} \beta_1^* & \beta_2^* & \cdots & \beta^*_p \end{bmatrix}^T$$
calculate $\max(\beta^*_i, \, 0)$ for $i \in [1,p]$? Any help will be very appreciated!
The approach you suggested, using only (1) and (3) to find a candidate solution and then projecting onto nonnegative vectors will not generally work on its own. However, an iterative method using the same idea is called proximal gradient descent. Given a function $f(x)$ and the constraint that $x \in A$, the set of feasible solutions, as long as you can compute the projection operator $P_A$ you can use the projection of small updates to $x$ in the direction of the gradient as an iterative update.
$$ x \leftarrow P_A(x - \alpha\nabla f(x)) $$
Where $\alpha$ is a step size that can be chosen with a line search or heuristically. If too large, you can run the risk of moving too far away from the feasible set and getting slow convergence or divergence.
You have multiple, more complex constraints though, and it may be difficult to find a simple expression for a projection onto the intersection of the constraints. A potential solution is to use Dijkstra's algorithm for projections onto the intersection of sets. Naively cycling through the constraints and projecting onto them one at a time may not give the projection onto the intersection- it will be a point in the intersection, but it may not be the closest point to the input. Dijkstra's algorithm will with just a small amount of extra bookkeeping.
Another option is to simplify the problem- choose two elements of the current solution that violate their constraints, and solve for their optimal values holding all else constant. It will be a much simpler subproblem and can be very efficient for some problems. It will end up looking very much like the sequential minimal optimization method used to solve support vector machine learning- you may be able to adapt it nearly directly. This has the benefit of only needing to operate on the variables violating the constraints after each step, so if the solution path stays close to the interior of the constraints it will be close to unconstrained gradient descent in performance.
That said, if what you need is a solution and it's not important that you write it yourself, you will almost certainly have better performance with a general quadratic programming library. Setting up the problem in Python's cvxopt package takes less than 5 lines of code and an immense amount of work has been done on these kinds of solvers that you don't have to recreate.