Is there a canonical "algebra of strings"?

281 Views Asked by At

Recently, I've been toying with the idea of formalising arithmetic purely in terms of string manipulation - no logic or interpretation required. However, when trying to elaborate upon this idea in detail, I find myself struggling with overlong and cumbersome descriptions for what ought to be fairly simple operations.

This leads me to wonder if there isn't a canonical "algebra of strings" which I could use to effectively describe what I am talking about, with notations and conventions analogous to those employed in set theory, abstract algebra, and other areas of mathematics. In particular, I would find it extremely useful if there were commonly accepted terminology and notation for the following:

The operation which replaces every instance of a particular substring in a string with a specified string.

The operation which removes every instance of a particular substring within a string.

The operation which removes a particular instance of a particular substring from a string (e.g. dropping the last character, $abcb\to acb$, etc.)

The "product" of two sets of string $A\times B=\{ab:a\in A\land b\in B\}$

The "basis" of a set of strings (ex. $\{acca,aba,abbc,acabc\}\to\{a,b,c\}$)

Of course, I can always introduce new notation or terminology, but as the list of concepts lengthens I find it more and more desirable to have a familiar standard to work from.

1

There are 1 best solutions below

0
On

What you call basis is the "alphabet" of a formal language (name for sets of string).

What you call "product" is the concatenation of two languages (the product in the free monoid).

Operations like you describe are not very common in Formal Language Theory. All of your examples can be implemented by regular transducers, though. You might also find something under the name of gsm (generalized sequential machine). The simultaneous replacement is also captured by some kinds of Lindenmayer systems.

Here you already run into some problems, though. What is, for example, the result of replacing all occurrences of aba by c in ababa? There are two occurrences, but they overlap. So replacing one destroys the other. Thus your result could be cba and/or abc or cc depending on what exactly you mean by replacing.