Is this notation, $\{(x_i, y_i) \mid (x_i, y_i) \in \mathcal{D}, x_i^{(j)} = c\}$, legal and clear from set theory perspective?

82 Views Asked by At

In machine learning, an ordered pair $(x_i,y_i)$ denotes $i^{th}$ example in the training set $\mathcal{D}$, where $\mathcal{D} = \{(x_1,y_1),(x_2,y_2), ..., (x_m,y_m)\}$.

each examples has n features, denoted $x_i = (x_i^{(1)}, ..., x_i^{(n)})$, where $x^{(j)}_i$ denotes the $j^{th}$ feature of $i^{th}$ example.

some leaning methods such as decision tree, learn models by partitioning training data.

I would like to use this notation to denote the partitioned data

$\{(x_i, y_i) \mid (x_i, y_i) \in \mathcal{D}, x_i^{(j)} = c\}$, where c is the value used to partition training data, j is the feature that is being considered to partition training data.

is this notation legal and clear from set theory perspective?

PS: sorry for previous misleading version.

2

There are 2 best solutions below

2
On

It is confusing. The use of i is not set builder syntax.
It is understandable only because of your explanation.
Clear ways of describing that set:
{ (x,y) : (x,y) in D, x = c },
{ x : x in D, $\pi_1$(x) = c }.
Clearer but not recommended
{ $(x_i,y_i)$ : i in index set of D, $x_i$ = c }.

1
On

Since $x_i$ is constant, I would write it as $\{(c, y_i) \mid (c, y_i) \in \mathcal{D}\} $.