To be concrete, if I have an inner product space over some field $\mathbb{F}$ of say dimension 3, such that my orthonormal vectors are {(1,0,0),(0,1,0),(0,0,1)}, what kind of algebraic structure is this? It's not a vector space or module since if you add (1,0,0) + (0,1,0) then the result (1,1,0) is not part of this set. I suppose you could multiply by the underlying field of scalars e.g. 9*(1,0,0)=(9,0,0), but that's it. So it's an algebraic structure $G = (\mathbb{F}, V, \cdot)$ where $ \forall a \in \mathbb{F}, v \in V$ such that $a\cdot v \in G$ and the usual rules for associativity, etc.
**I'm interested in what the domain is for a function $f : W \rightarrow V$ that maps a set of orthonormal bases $W$ (from the perspective of V) to the vector space $V$ with orthonormal bases defined by $W$. I'm also interested in what $f$ is called. Category-theoretically, is there a forgetful functor $f : V \rightarrow W$ from a vector space V to its set of orthonormal bases? And then an adjoint functor that "goes back" from this set of bases to the vector space?
The reason I ask is that in the machine learning field, there is something called Word2Vec that encodes and embeds natural language words as vectors in a normed vector space. BUT, we initially encode words using what's called "one-hot" or "1-of-K" encoding, which essentially looks like a set of orthonormal basis vectors.
For example, consider a language with a vocabulary of 100 words $W = \{cat, eats, mouse, ...\}$. We want to embed these words into a normed vector space so that the distance between the word-vectors corresponds to the similarity in meaning between these words.
So we arbitrarily encode each word as an orthonormal basis vector, e.g. $((1,0,0, ...),(0,1,0, ...),(0,0,1, ...)) = (cat, eats, mouse, ...)$. In order to learn the meaning of these words, we take example sentences (contexts) from some corpus in the language and we train a machine learning algorithm, i.e. a function $f : W \rightarrow V$ where V is a vector space with dimensions = cardinality of W (i.e. dim(V) = # of words in W). We normalize the output of $f$ such that it sums to 1, so we can interpret the output vector as a probability distribution over words. Basically we input a word vector into $f$ and then use gradient descent to predict the probability distribution for the next word that is likely to follow it in a phrase/sentence.
The function $f$ is actually just a matrix multiplication followed by a normalization operation on the resulting vector. When we optimize $f$ so that it accurately predicts the word likely to follow some input word, then we can interpret the rows of the matrix as new word vectors in a vector space with say the euclidian metric defined. If the matrix has 3 columns, then each vector is 3-dimensional and we can visualize the word vectors in 3D space and see which words cluster together (similar meaning), etc. We can also do algebra on words like king - man + woman = x and the closest word vector that solves this equation will be queen.
OK. So my issue is that with this Word2Vec model, the machine learning algorithm's (the function $f$) input space is always supposed to be encoded (1,0,0...) etc as if it were an orthonormal basis. BUT, we don't want our function $f$ to accept "mixed words" like (1,1,0...). So the codomain of $f$ is definitely a vector space, but what is a mathematically rigorous definition of the domain of $f$ such that it matches the Word2Vec model's expectations but also remains computationally meaningful. It's like $f$ is taking a bare set and building an inner product space out of it.

For any finite set, $S$, we can make an inner product space (over $\mathbb Q$ say) by considering formal $\mathbb Q$-linear combinations of the elements of $S$, e.g. expressions like $3s_1 + \frac{2}{3}s_4$ where $s_1, s_4 \in S$. We define the inner product structure by saying that $\langle s_i,s_j \rangle = \begin{cases}1, &\text{if }i=j\\0, & \text{if }i\neq j\end{cases}$ and extending to arbitrary formal $\mathbb Q$-linear combinations by bilinearity. A different way of representing the same thing is simply to use the function space $S\to\mathbb Q$ with addition and scaling being defined pointwise, e.g. $(3f)(s)=3f(s)$. The inner product can then be defined as $\langle u,v \rangle = \sum_{s\in S}u(s)v(s)$. This reveals the tuples, e.g. $(1,0,0)$, as representations of functions $S\to\mathbb Q$ where $|S|=3$, i.e. given $u : S \to \mathbb Q$ and some arbitrary ordering on $S$, we have the tuple $(u(s_1),u(s_2),u(s_3))$.
Ignoring the inner product aspect, this construction gives a free vector space given a finite set. This means given any function $f : S \to UV$ where $UV$ is the underlying set of the vector space $V$, there is a unique ($\mathbb Q$-)linear transformation $FS\to V$ natural in $S$ and $V$ and vice versa where $FS$ is the construction described above.
From this perspective, there is no structure on $\{(1,0,0),(0,1,0),(0,0,1)\}$. It's just a three element set whose elements happen to have suggestive names.