I find the common potential outcomes notation used in causal inference somewhat confusing.
Given a binary exposure $X$ and an outcome $Y$, the expression $Y_i(1)$ (sometimes also denoted $Y_{1,i}$ or $Y^1_i$ or a similar notation) is somehow defined as "the outcome for unit $i$ if $X$ was set to $1$" (or something similar) and the average causal effect is $E[Y_i(1)-Y_i(0)]$ or $E[Y(1)-Y(0)]$ depending on source.
I understand the concept of intervention, but I am confused about the sources of randomness here.
Which parts are random variables (and if so, on what probability spaces), which are deterministic functions (and if so, between which sets) and which are constants (and if so, in what set)?
Is the potential outcome $Y_i(x)$ for a fixed unit $i$ and a fixed value $x$ a constant or a random variable? If it is a constant (so the potential outcome is a deterministic function on the set of possible values of $(i,x$)), does it become a random variable because we consider $X$ to be a random variable or because we consider $i$ to be a random variable (a random sample from some population) or both? I have searched several sources that use the potential outcomes framework, but unfortunately none has been precise about this.
I suppose maybe we have some fixed probability space $(\Omega,\mathcal{F},P)$ and that $X:\Omega\to\{0,1\}$ and $Y:\Omega\to\mathbb{R}$ are random variables on this space. I also suppose that the potential outcomes somehow are $\mathbb{R}$-values random variables on this same probability space.
Is $i$ considered to be an element of $\Omega$ and $Y(1):\Omega\to\mathbb{R}$ a random variable such that for for each fixed $i\in\Omega$ we have $Y_i(1):=Y(1)(i)\in\mathbb{R}$ or is $Y_i(1):\Omega\to\mathbb{R}$ itself a random variable for each fixed unit $i$ from some other external set or is $i:\Omega\to$"some measurable space" itself a random variable (hence any deterministic function of it random)?
I hope someone help me understand this notation and these objects mathematically or point me to a good source that treats it formally with probability theory.
Basically, the stable unit treatment value assumption (SUTVA) states that $Y_i(x)$ is constant and independent of all other~$j$, $j \neq i$. However, generally, $Y_i(x) \neq Y_j(x)$. Thus, for some super-population, the set of all $\{Y_i(x)\}_{i\ge1}$ form a distribution, hence $Y(x)$ can be treated as a random variable, and you can compute the ATE, $E[Y(x) - Y(0)]$, or any other contrast.