I am reading this nice document about the subgradient method, which defines the subgradient method iteration as follows.
$$x^{k+1}=x^k-\alpha_k g^k,$$
for a $g$ such that
$$f(y) \geq f(x)+g^T(y-x).$$
If $f$ is differential then $g$ is its gradient. This seems to suggest that for any valid $g$, we are ensured to increase (not strictly) the value of $f$. However the same document states the following.
the subgradient method is not a descent method; the function value can (and often does) increase
and proposes to keep track of $f$ in each iteration, in order to keep track of the best one.
This seems to contradict the first statement about the choice of $g$, as we are choosing a $g$ such that $f$ increases, how can the next $f$ not be the best so far?
I believe your confusion comes from the fact that you inaccurately read the sign of the inequality. If $g^k$ is a sub-gradient at $x_k$, then by taking $y = x^{k+1}$ and $x = x^k$ the sub-gradient inequality is: $$ f(x^{k+1}) = f(x^k - \alpha g^k) \geq f(x^k) + {g^k}^T(-\alpha g^k) = f(x^k)-\alpha \| g^k \|_2^2 $$ This means that $f(x^{k+1})$ can be any value above $f(x^k)-\alpha \| g^k \|_2^2$, and in particular, any value above $f(x^k)$. So the sub-gradient inequality does not ensure that it is a descent method.
In contrast to the sub-gradient method, when $f$ is differentiable and $\nabla f$ is Lipschitz continuous, you have the Descent Lemma: $$ f(y) \leq f(x) + \nabla f(x)^T (y - x) + \frac{L}{2} \|y - x\|_2^2 $$ The Descent Lemma is the property which ensures descent, and it does not necessarily holds when you replace the gradient with a sub-gradient.