I'm writing a paper and would like to describe my approach with formulas too. However, I have a problem with writing down the following mapping step (just a tiny step of my algorithm).
Imagine, that I have a vector of numbers (they are computer port numbers):
80, 22, 22, 443, 80, 443, ... , 80
Now I map them to the "categories" like this:
1, 2, 2, 3, 1, 3, ..., 1
where 80 is first category found, so it will be always mapped to 1, 22 - second category found, so it will be always mapped to 2, and so on.
Another example with vector of usernames, I map
bob, alice, bob, carol, bob, bob
to
1, 2, 1, 3, 1, 1
If you know R programming language, all I do is:
as.numeric(factor(v))
So now I want to describe it in the paper formally with something like:
"I have a dataset $D = \{c_1,c_2,...,c_n\}$, where each column is a vector of values:
$c_j = \{x_1,x_2,...,x_m\}$
now for each column $c_j$ I map it to
$c'_j = \{f_j(x_1),f_j(x_2),...,f_j(x_m)\}$, where $f_j(x)$ = ???"
so my question is what to write instead of ??? (Using words I can describe it as "mapping to category", but how to write it down using mathematical notation?)
If you really want to intimidate your readers, do the following (for each column vector):
Let $N$ be the number of items in the vector. Let $I = \{1, \dots, N\}$ be the index set of the objects in the vector. Let your vector be $(x_i)_{i \in I}$.
Introduce a binary relation on $I$ by $i \sim j \iff x_i = x_j$. It is easy to prove that this is an equivalence relation.
Let $\hat I$ be the quotient set $I / \sim$ and for each $i \in I$ let $\hat i \in \hat I$ denote its equivalence class under the relation $\sim$.
Finally, create a new vector $(\hat i) _{i \in I}$ by putting on position $i$ its equivalence class $\hat i$. In your own notations, $f(x_i) = \hat i$.
This will make your readers collapse in awe and not understand anything, and your bosses humbly beg you to accept an increase of your salary (or order you shot for fearing that you would outsmart and next overthrow them).