Formal definition of potential outcomes in causal inference

151 Views Asked by At

I find the common potential outcomes notation used in causal inference somewhat confusing.

Given a binary exposure $X$ and an outcome $Y$, the expression $Y_i(1)$ (sometimes also denoted $Y_{1,i}$ or $Y^1_i$ or a similar notation) is somehow defined as "the outcome for unit $i$ if $X$ was set to $1$" (or something similar) and the average causal effect is $E[Y_i(1)-Y_i(0)]$ or $E[Y(1)-Y(0)]$ depending on source.

I understand the concept of intervention, but I am confused about the sources of randomness here.

Which parts are random variables (and if so, on what probability spaces), which are deterministic functions (and if so, between which sets) and which are constants (and if so, in what set)?

Is the potential outcome $Y_i(x)$ for a fixed unit $i$ and a fixed value $x$ a constant or a random variable? If it is a constant (so the potential outcome is a deterministic function on the set of possible values of $(i,x$)), does it become a random variable because we consider $X$ to be a random variable or because we consider $i$ to be a random variable (a random sample from some population) or both? I have searched several sources that use the potential outcomes framework, but unfortunately none has been precise about this.

I suppose maybe we have some fixed probability space $(\Omega,\mathcal{F},P)$ and that $X:\Omega\to\{0,1\}$ and $Y:\Omega\to\mathbb{R}$ are random variables on this space. I also suppose that the potential outcomes somehow are $\mathbb{R}$-values random variables on this same probability space.

Is $i$ considered to be an element of $\Omega$ and $Y(1):\Omega\to\mathbb{R}$ a random variable such that for for each fixed $i\in\Omega$ we have $Y_i(1):=Y(1)(i)\in\mathbb{R}$ or is $Y_i(1):\Omega\to\mathbb{R}$ itself a random variable for each fixed unit $i$ from some other external set or is $i:\Omega\to$"some measurable space" itself a random variable (hence any deterministic function of it random)?

I hope someone help me understand this notation and these objects mathematically or point me to a good source that treats it formally with probability theory.

3

There are 3 best solutions below

0
On

Basically, the stable unit treatment value assumption (SUTVA) states that $Y_i(x)$ is constant and independent of all other~$j$, $j \neq i$. However, generally, $Y_i(x) \neq Y_j(x)$. Thus, for some super-population, the set of all $\{Y_i(x)\}_{i\ge1}$ form a distribution, hence $Y(x)$ can be treated as a random variable, and you can compute the ATE, $E[Y(x) - Y(0)]$, or any other contrast.

0
On

Have a look at "Structural Causal Models"

There are essentially two perspectives on causal models, one being the "Potential Outcome" framework , and the other being the Structural Causal Model framework. Both can be used to estimate treatment effects like the one you have defined. I will answer from the perspective of structural causal models which have a very well-defined mathematical foundation. For an introduction I would recommend Elements of Causal Inference. For a more formal treatment the book Causality (by Judea Pearl) is very comprehensive, but it can be quite a mouthful.

In short, the main idea is that every variable in the model is a function of its causes and an independent noise component. The causal relationships can be represented by a directed acyclic graph (DAG). Just to give a simple linear example, given random variables $N_X, N_Y$, the structural equations could be given as

$X := N_X$

$Y := \beta X + N_Y$.

Thus, X and Y are both random variables. The observations can be understood as samples of the random variables. If $X$ is binary, $E[Y(1) - Y(0)]$ is the difference between the conditional means of $Y$ given $X=1$ and $X=0$.

0
On

The following is a possible way to view the potential outcome framework from a probability perspective.

Let $(\Omega, \mathcal{F}, \mathbb{P})$ denote a fixed background probability space. The potential outcome framework (for binary treatment) considers the following data-generating process.

  • Generate covariates $Z: \Omega \to \mathbb{R}^d$ and unobserved noise variables $N: \Omega \to \mathbb{R}^k$ such that $Z \perp N$ under $\mathbb{P}$.

  • Generate unobserved potential outcomes, i.e., two random variables $Y_0 = f_0(Z, N)$, $Y_1 = f_1(Z, N)$.

  • Generate treatment variable (i.e., assign treatment), $X = g(Z, N) \in \{0, 1\}$.

  • Generate observed outcome $Y = X Y_1 + (1 - X) Y_0$.

Some comments

  • In practice, the potential outcome framework does not model explicitly $N$ and makes no parametric assumptions on $f_0$ and $f_1$.

  • $Y_0$ ($Y_1$) is a random variable that describes the "causal mechanism" if you assign treatment $X = 0$ ($X = 1$) to the whole population.

  • At an individual level, $Y_0$ and $Y_1$ are generally treated as deterministic. However, it is also possible to model them as random at the individual level (see Technical Point 1.2 of the book "Causal Inference: What If" from Robins and Hernan: open access pdf).

  • The strong ignorability assumption requires that $X \perp (Y_0, Y_1) | Z$. This is satisfied, for example, if $g$ or $f_0, f_1$ do not depend on the unobserved $N$.

  • The goal is to understand whether the treatment $X$ has a causal effect on the outcome variable. This can be done by comparing $Y_0$ and $Y_1$. Since they are random, a common target of inference is the average treatment effect defined by ATE $=\mathbb{E}[Y_1 - Y_0]$.

  • In general, $\mathbb{E}[Y_1 - Y_0] \neq \mathbb{E}[Y | X = 1] - \mathbb{E} [ Y_0 | X = 0]$.