Mixing conditional random variables by sampling

22 Views Asked by At

I am struggling to put my transformation of data into mathematical contexts. My goal is to define a mapping that transforms the original data into some awkwardly mixed data. In my simulation study, I have a data frame with two columns $(X,Z)$. $Z$ is binary and $X$ is possibly continuous or discrete. I divided the data frame into two: $X|Z=0$ and $X|Z=1$. Then with a small probability $\delta=0.1$, I mixed the data frames across each time. Specifically, once by sampling 0.1 from $X|Z=0$ and 0.9 from $X|Z=1$. The other by sampling 0.1 from $X|Z=1$ and 0.9 from $X|Z=0$. I will denote each data frame by $df(1)$ and $df(0)$. Then, I changed all the values of $Z$ in $df(1)$ into $1$ and similarly, all the values of $Z$ in $df(0)$ into 0. Finally, I concatenated them together. How do I turn this transformation into mathematical context? This is what I have figure out so far. $$ (X,Z)|Z=0 \sim F_0 \\ (X,Z)|Z=1 \sim F_1 $$ and let $S \sim Ber(1-\delta)$ where $S\perp \!\!\! \perp (X,Z)|Z$ from Dawid's notation, $$ T_0=\{S(X,Z)|Z=0\} + \{(1-S)(X,Z)|Z=1\} \\ T_1=\{S(X,Z)|Z=1\} + \{(1-S)(X,Z)|Z=0\} $$ I'm not even sure the random variables make sense above. Then I have no idea how I need to proceed. I can define some real-valued functions that would return $Z$ as 0 or 1 for each random variables above, but when I concatenate them together, I don't have the slightest idea how I should operate them into random variable. For example, once I transform each variable into something like $g(T_0)=(X_0,0)$ and $h(T_1)=(X_1,1)$ then sticking them together, I get a matrix with one column of just 1s and 0s which does not seem like a valid random variables $(X^*,Z)$.