In FAST LEARNING RATES FOR PLUG-IN CLASSIFIERS by Audibert, Tsybakov, they use the following notation in the context of binary classification:
Let $(X, Y)$ be a random couple taking values in $Z = R^d\times \{0, 1\}$ with joint distribution P. We regard $X \in R^d$ as a vector of features corresponding to an object and $Y \in \{0, 1\}$ as a label indicating that the object belongs to one of two classes. Consider the sample $(X_1, Y_1),\ldots, (X_n, Y_n)$, where $(X_i, Y_i)$ are independent copies of $(X, Y)$. We denote by $P^{\otimes n}$ the product probability measure according to which the sample is distributed, and by $P_X$ the marginal distribution of $X$.
Later on, for the regression function $\eta$ and an estimator $\hat \eta_n$, they use the term
$$P^{\otimes n}(|\hat\eta_n(X)-\eta(X)|\ge\delta).$$
I don't quite understand this term. If $(X,Y)\sim P$, why do we measure this term in the product probability measure? Do they only emphasise that $\hat\eta_n$ is estimated from the data? Could we equivalently write $P$ instead of $P^{\otimes n}$?
Are you referring to equation (3.1)? It's important to note that (3.1) is a statement about all $x$ rather than about the random variable $X \sim P$. Since (3.1) involves the empirical $\hat\eta_n$, the only random component is the samples on which this estimator is based. So they write $P^{\otimes n}$ as you mentioned.