Variable transformation for training a machine learning model

37 Views Asked by At

Suppose you have a train set $\mathbf{T}$ and you want to train some Machine Learning models. Each row of $\mathbf{T}$ consists in a set(vector) of attributes or variables $\mathbf{x} = (x_1, x_2...)$ where each variable $x_i$ could be from different types (numeric, string, date...). Suppose I have a set of functions such that each function is a transformation over a/some variable/s of $\mathbf{x}$ *(normally each function is an indicator function over the variable $\mathbf{I}_{rule}(x_i)$ or simpler, just the application of an if...else structure over a particular variable $x_i$). It comes that $f$ produces a binary outcome ($1$ or $0$, True or False) e.g. $x_1: f(x_1))$ such that $f(x_1) = \mathbf{I}_{x_1 > k}(x_1)$

Applying these functions over the variables we have:

$x_1: f(x_1), g(x_1)...$ # functions f and g applied over $x_1$

$x_2: h(x_2), l(x_2)...$ # functions h and l applied over $x_2$

...

$(x_1, x_2, x5) = r(x_1, x_2, x5)$ functions applied over $(x_1,x_2,x_3)$

...

Let define the $\mathbf{T}_{fun}$ the dataset where each row is defined by the transformations. So each row will look like:

$\mathbf{x}_{fun} = (f(x_1), g(x_1)..., h(x_2), l(x_2)..., r(x_1, x_2, x5)) $

Let define $\mathbf{T}_{aug}$ the augmented train set such the data set is composed by the union of both data sets ($\mathbf{T}$ and $\mathbf{T_{fun}}$) so each row will look like:

$\mathbf{x}_{aug} = \mathbf{x} \cup \mathbf{x}_{fun}$ = $(x_1, x_2..., f(x_1), g(x_1)...h(x_2), l(x_2).....r(x_1, x_2,x_5))$

My question is:

does make sense train these machine learning models with the augmented train set or should I train the machine learning model with the train set and expecting that itself find the best combination of transformation?