The following collection of questions concerns the design of a randomized experiment where the $N$ units to be randomized to drug $A$ or drug $B$ are people, for whom we have a large number of background covariates, collectively labelled $X$ (e.g., age, sex, blood pressure, height, weight, occupational status, history of heart disease, family history of heart disease). The objective is to assign approximately half to drug $A$ and half to drug $B$ where the means of each of the $X$ variables (and means of non-linear functions of them, such as squares or products) are close to equal in the two groups. Instead of using classical methods of design, such as blocking or stratification, the plan is to use modern computers to try many random allocations and discard those allocations that are considered unacceptable according to a pre-determined criterion for balanced $X$ means, in particular an affinely invariant measure such as the Mahalanobis distance between the means of $X$ in the two groups. After an acceptable allocation is found, outcome variables will be measured, and their means will be compared in group $A$ and group $B$ to estimate a treatment effect.
Problem 1. Prove that if the two groups are of the same size (i.e., $N/2$ for even $N$), this plan will result in unbiased estimates of the $A$ versus $B$ casual effect based on the sample means of $Y$ in groups $A$ and $B$, where $Y$ is any linear function of $X$.
Problem 2. Provide a counter-example to the assertion that Problem 1 is true in small samples with odd $N$.
![]()
The above two problems are about causal inference. I've studied about statistical inference before, but I still don't understand some of the notations in solutions (e.g. What is a $\phi(\mathbf{X},\pmb{z})$? I've never seen such a notation before...). What I wanna ask is, are there any books about causal inference recommend reading for solving problems like above?
Thanks!


