I have the following set of linear constraints:
$$\begin {align}\textbf{y}^T\textbf {x} &= 0 \\ \textbf {0} &\leq\textbf {x} \leq C\cdot\textbf {1},\end {align}$$
where $\textbf {y} \in \{-1, 1\}^n, \textbf {x} \in \mathbb {R}^n, \; C \in \mathbb {R}$. I want project the gradient of a quadratic function $f (\textbf {x}) $ into the subspace defined by these constraints where movement along this projected gradient does not violate the constraints.
How can I get this projected gradient vector?
P.S. here is my reference I've been using. Scroll to page 367, 12.4 The Gradient Projection Method. The gradient projection method assumes that you start at a feasible point $\textbf{x}_0$. Then you create the matrix $A_q$ which consists of the rows of the active constraints, that this constraints where $\textbf{a}_i^T\textbf{x}_0=b_i$. In my case the active constraints for the initial solution $\textbf{x}_0=\textbf{0}$ (which is feasible) are:
$$\textbf{y}^T\textbf{x}_0=0$$
and
$$\textbf{x}_0\geq\textbf{0}$$
so my matrix $A_q$ is a $(n+1)\times n$ matrix:
$$A_q=\begin{bmatrix} y_1 & y_2 & y_3 & \dots & y_n \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix}$$
at feasible solution $\textbf{x}_0=\textbf{0}$. Now in my reference (page 367, last two lines) it is stated:
To compute this projection let $A_q$ be defined as composed of the rows of working constraints. Assuming regularity of the constraints, as we shall always assume, $A_q$ will be a $q\times n$ matrix of rank $q < n$.
Well, my matrix $A_q$ is of rank $q=n \not< n$. How to proceed?
This doesn't directly answer your question, but here is a different algorithm you could possibly use to solve the optimization problem \begin{align} \text{minimize} & \quad \frac12 x^T A x + b^T x \\ \text{subect to} & \quad y^T x = 0 \\ & \quad 0 \leq x \leq c \end{align} where the matrix $A$ is symmetric positive semidefinite.
Let $U = \{x \mid y^T x = 0\}$ and let $V = \{x \mid 0 \leq x \leq c \}$. Also let $I_U$ be the indicator function of $U$, defined by: \begin{equation} I_U(x) = \begin{cases} 0 & \quad \text{if } x \in U \\ \infty &\quad \text{otherwise.} \end{cases} \end{equation} The indicator function of $V$, denoted $I_V$, is defined similarly.
The optimization problem can be reformulated as \begin{equation} \text{minimize} \quad \underbrace{\frac12 x^T Ax + b^T x + I_U(x)}_{g(x)} + \underbrace{I_V(x)}_{h(x)}. \end{equation} There is a very popular optimization algorithm called the Douglas-Rachford method that will allow you to minimize a sum of two simple functions like this. You could implement the Douglas-Rachford method for this problem in a page of Matlab.
At each iteration of the Douglas-Rachford method, you will have to evaluate the "proximal operators" of $g$ and $h$. The proximal operator of $g$ (with parameter $t > 0$) is defined by \begin{equation} \text{prox}_{tg}(\hat x) = \arg \min_x g(x) + \frac{1}{2t} \|x - \hat x\|_2^2, \end{equation} and the proximal operator of $h$ is defined similarly. You can work out simple closed form expressions for the proximal operators of $g$ and $h$. To evaluate the proximal operator of $h$, you can see that we need only project onto $V$. And to evaluate the proximal operator of $g$, we must minimize a quadratic function subject to linear constraints.
One issue with this method is that evaluating the proximal operator of $g$ will require solving a linear system of equations involving $A$, which might be difficult if your problem is very large scale. If $A$ is sparse or has some other structure we can exploit then that might help a lot.
The Douglas-Rachford iteration is: \begin{align*} x^k &= \text{prox}_{tg}( z^{k-1}) \\ y^k &= \text{prox}_{th}(2 x^k - z^{k-1}) \\ z^k &= z^{k-1} + y^k - x^k. \end{align*} You pick the parameter $t > 0$ and also the initial values $x^0,y^0,z^0$ however you like.