Category modeling in quantitative Linguistics

64 Views Asked by At

I'm currently doing some studies on a new quantitative unit in Linguistics, the so called Motifs. A Motif is a ascending (or descending) sequence of quantitative linguistic properties. For a better intuition, here's an example

  1. I did not inhale and never tried it again
  2. (1 3 3 6) (3 5 5) (2 5)

There's the example sentence (1) and it's representation in l-Motifs in (2). An l-Motif is a sequence of length, in this case lenght of words in letter count. There are also f-Motifs, which are motifs of frequencies, e.g. frequency of the word in a corpus. Motifs can be taken from motifs, e.g. it is possible to count the lenght of an l-Motif. The first length in (2) would be 4. On this numbers it is again possible to find ascending sequences which again build motifs.

I want to formalize this "unit" and it's properties for further computations. I thought on modelling it as "Motifs Category". When I understand the meaning of categories right, there are four properties which must be given.

  • there need to be object (in this case the different motifs and of course the abstract object of text, which is "transformed" to a motif)
  • there need to be morphisms (in this case it would be the mapping from a text to a motif structure and from a motif structure to a motifs structure, either l- or f-)
  • there need to be an identity morphism (which is obvious I think)
  • there need to be a composition (I can build a f-motif structure from a l-motif structure from a f-motif structure etc.)

My questions now are:

  • Are the assumptions on categories right?
  • Is this really the right intuition behind the "Motifs Category" to be modelled?
  • If the above are true: What would be the right way to write it?

Here's what I've got so far:

$T :$ texts

$L :$ l-motifs

$F :$ f-motifs

$\phi_{word}^{L} : T \to L$
which "transforms" or "maps" (I'm not sure of the right term here) a text to a l-motif structure by word length

$\phi_{word}^{F} : T \to F$
same as above, but in f-motif structure

$\phi_{word}^{F} \circ \phi_{word}^{L} = \phi_{word}^{FL}$
which is the l-motif structure of a f-motif structure of a text