I'd like to express the splitting of input vectors and their corresponding labels in set builder notation. My question consists of a few smaller parts.
Consider a set of input vectors $X = \{\hat{x}_n \mid n \in \mathbb{Z}, 0 \leq n \lt C \}$, where $C$ is the total number of available input vectors. It is often the case in AI problems that you wish to split the input set into two uneven parts $X_{\text{train}}$ and $X_{\text{test}}$. Let's assume an $80\%$ train, $20\%$ test scenario. So, my first question is how to best represent this. I have considered binomial coefficient notation followed by set subtraction:
$$ X_\text{test} \in {X \choose {\lfloor \frac{C}{5} \rfloor}} \qquad X_\text{train} = X \setminus X_\text{test} $$
Is the above valid? This leads to the next question: for labels, lets say we have $y = \{\lambda_n \in \{0, 1\} \mid n \in \mathbb{Z}, 0 \leq n \lt C \}$. Is it enough to say:
$$ y_\text{test} = \{\lambda_n \mid \hat{x}_n \in X_\text{test}\} \qquad y_\text{train} = \{\lambda_n \mid \hat{x}_n \in X_\text{train}\} $$
or do I need to qualify the possible values of $n$ again, or include a $\forall$ somewhere, or am I just generally way off in notation? If I am, would something like the union of sets notation work?
$$ y_\text{test} = \bigcup_{x_n \in X_{\text{test}}} \{\lambda_n\} \qquad y_\text{train} = \bigcup_{x_n \in X_{\text{train}}} \{\lambda_n\} $$
Or does this just present the same issues as above?