Since I want each tree's selection to be as independent as possible, I've created a discrete optimization problem: $$A\in \lbrace 0,1 \rbrace^{x,y}$$ $$A [1]^T =[C]$$ $$A^{\star}=argmin_{A}\|AA^T\ - CI|_2^2$$
The features will be selected by the binary matrix $A$, where the number of rows is the number of trees in the random forest, and each row has $C$ non-zero entries(the sum of each row of matrix $A$ is constant $C$) which indicate the features that the tree will select. Finally, $\|AA^T\ - CI|_2^2$ measures how dependent each row is on the other. Could I use the projected gradient algorithm to solve the problem?