I try to understand the meaning of E in this formula. The formula is taken from https://arxiv.org/pdf/1710.09412.pdf and it is on the 3rd page. The paper is about minimising vicinal risk loss. $\lambda$ is a value from a beta distribution.
\begin{equation} \mu\left(\tilde{x}, \tilde{y} \mid x_{i}, y_{i}\right)=\frac{1}{n} \sum_{j}^{n} \underset{\lambda}{\mathbb{E}}\left[\delta\left(\tilde{x}=\lambda \cdot x_{i}+(1-\lambda) \cdot x_{j}, \tilde{y}=\lambda \cdot y_{i}+(1-\lambda) \cdot y_{j}\right)\right] \end{equation}
Does E represent vicinal distribution or is it expectation?
The expectation is with respect to $\lambda \sim \text{Beta}(\alpha, \alpha)$. Pretend that $\lambda =Z$ if it makes it easier. Technically it is the following:
$$\mu=\frac 1 n\sum_{j=1}^n \int_0^1 \delta(...) \frac 1{B(\alpha, \alpha)}\lambda^{2(\alpha-1)}d\lambda$$
The resulting vicinal distribution $\mu(\tilde x, \tilde y|x_i,y_i)$ allows for linear interpolation of classes, i.e. if two data points are of different classes there should be a linear gradient of probability that the points in between are of class 0 or 1. Points can be simulated from this distribution by randomly picking $m$ points $(x_i, y_i), i=1,2,...,m$ from the observed data and then generating from the conditional distribution $\mu$. Points will no longer be exactly $(x_i, 0)$ or $(x_i, 1)$ but rather in a vicinity with respect to both $x_i$ and $y_i$, differing from the Chapelle et al. article which augmented the data with only Gaussian noise.
Their motivation was stated at the beginning:
Left is a picture with orange being class 1, green being class 0. mixup is what they are proposing. Intensity of blue is the probability of being class 1.