AI Train/Test Split in Set Builder Notation

175 Views Asked by Bumbble Comm At 08 Apr 2026 - 7:14

I'd like to express the splitting of input vectors and their corresponding labels in set builder notation. My question consists of a few smaller parts.

Consider a set of input vectors $X = \{\hat{x}_n \mid n \in \mathbb{Z}, 0 \leq n \lt C \}$, where $C$ is the total number of available input vectors. It is often the case in AI problems that you wish to split the input set into two uneven parts $X_{\text{train}}$ and $X_{\text{test}}$. Let's assume an $80\%$ train, $20\%$ test scenario. So, my first question is how to best represent this. I have considered binomial coefficient notation followed by set subtraction:

$$ X_\text{test} \in {X \choose {\lfloor \frac{C}{5} \rfloor}} \qquad X_\text{train} = X \setminus X_\text{test} $$

Is the above valid? This leads to the next question: for labels, lets say we have $y = \{\lambda_n \in \{0, 1\} \mid n \in \mathbb{Z}, 0 \leq n \lt C \}$. Is it enough to say:

$$ y_\text{test} = \{\lambda_n \mid \hat{x}_n \in X_\text{test}\} \qquad y_\text{train} = \{\lambda_n \mid \hat{x}_n \in X_\text{train}\} $$

or do I need to qualify the possible values of $n$ again, or include a $\forall$ somewhere, or am I just generally way off in notation? If I am, would something like the union of sets notation work?

$$ y_\text{test} = \bigcup_{x_n \in X_{\text{test}}} \{\lambda_n\} \qquad y_\text{train} = \bigcup_{x_n \in X_{\text{train}}} \{\lambda_n\} $$

Or does this just present the same issues as above?

Original Q&A

AI Train/Test Split in Set Builder Notation

Related Questions in ELEMENTARY-SET-THEORY

Related Questions in NOTATION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions