Background: The Burrows-Wheeler Transform (BWT) is an a mathematical transform traditionally defined on strings of letters and used in signal processing for example for data compression purposes.
Here is an engineering-language description of what it does:
- Produce all rotations of a string.
- Sort the rotations lexicographically.
- Pick the last column and store it.
What I seek is a mathematical formalization that will allow me to seek generalizations of this method.
Now to produce a mathematical question we will first need to mathematically define this.
Own work:
For step 2) to make sense we need to define larger than operator. Or impose an "order".
Assume we have two functions: $$t\to f(t) , t\to g(t)\\ f(t),g(t)\in \{0,\cdots, M\}\\t\in\{0,\cdots,N\}$$ Where $M$ typically can be $255$, as in the case of an ASCII string or $65535$ for unicode. $N$ is the length of the string. Now define an order: $$f > g \textrm{ iff } \cases{f(i) = g(i), \forall i < j\\\exists j :f(j) > g(j)}$$ $$f = g \textrm{ iff } \nexists j: f(j)\neq g(j)$$
Now as for the "rotation" operation described in 1) we consider the modulo operator. I suppose we can formally define this as some kind of a higher order function $k\to R(k) = t\to \mod(t+k,N+1)$
And then we can define the "rotations" by function concatenation:
$$f_k = (f \circ R(k))$$
Now what we want to sort is the set of functions $\{f_0,\cdots, f_{N-1}\}$
This sort can be viewed as an invertible function on the function indexes $i\to s(i)$
so that the sequence of functions $\{f_{s(0)}, f_{s(1)}, \cdots f_{s(N-1)}\}$ is sorted in the sense above so that $f_{s(0)} \leq f_{s(1)} \leq \cdots \leq f_{s(N-1)}$
Now, finally what we "store" is the values $f_{s(0)}(N), f_{s(1)}(N), \cdots, f_{s(N-1)}(N)$
Or perhaps we want to view it as a higher order function that takes $\{t \to f(t)\} \to \{t\to f_{s(t)}(N)\}$
First of all : Does this description make sense mathematically?
Secondly : Will it be fruitful in attempting to find steps that can be generalized?