I've seen this notation in the SHAP paper, which extends Shapley values to Machine Learning models to give a form of local explanation.
In the paper, on page 5, the author uses the following notation:
$$z_{\bar S} \space | \space z_S$$
where $z$ is a vector of features for a model, $S$ indicates the set of features included in the model, $\bar S$ is the complement set of features and not included in the model, and $z_S$ is the feature vector that has values for the features in $S$ only and missing features otherwise. Likewise, $z_{\bar S}$ is a feature vector for all the features not included in the model and missing features otherwise.
This style of notation is used to change a conditional expectation value into a different form:
$$\begin{align} E[f(z) \space | \space z_S] &= E_{z_{\bar S} | z_S}[f(z)] \\ &\approx E_{z_{\bar S}}[f(z)] \end{align}$$
where $f$ is the model, and $f(z)$ is the model's prediction for input vector $z$. The author states you can get to the second line from the first by assuming independence between the features.
What does this notation mean? To me it reads as $z_{\bar S}$ given $z_S$, but wouldn't that make the notation superfluous? How can there be a $z_{\bar S}$ without a $z_S$?
Also, I don't see how the notation allows me to make changes to the conditional probability equation.
If you read page 5 of the paper you'll find the following section that explains the purpose of $E[f(x) \space | \space z_S]$:
This tells you that because models can't remove input features, we get the expected value of $f(z)$ conditional on the features $z_S$ being chosen.
Using this, let's look at the first line of your second equation:
this means that the conditional expectation is equal to the expected value of $f(z)$ across the marginal distribution of the features $z_{\bar S}$ conditional on the features $z_S$.
In other words, we want to get the average value of the $f(z)$ given our chosen feature values and the distributions of the features $z_{\bar S}$. We say that $z_{\bar S}$ is conditional on $z_S$ because the distributions could/ will change depending on the chosen feature values $z_S$. The changes are due to the correlations or dependencies between the features, which make different feature value combinations more likely than others. That is the meaning of the notation:
$$z_{\bar S} \space | \space z_S$$
To explain the move from the first to the second line. If we assume feature independence, we no longer need to condition $z_{\bar S}$ on the chosen features $z_S$, as the distribution will not change with $z_S$. Making the $| \space z_S$ notation unnecessary.
In practice, this assumption is unlikely to be correct, but presumably the authors have shown it is good enough to calculate the Shapley values with reasonable accuracy.