What is the formal definition of 'string'?

372 Views Asked by At

I just found out about uncountable and infinitary languages and I'm not sure what "string" means now. Originally, I had conceived of a string being, at most, a countable sequence of symbols with a least (leftmost) member. Since this fails to account for several types of infinite-length string (for example, strings that have no first or last symbol, uncountable strings), I'm at a loss as to what "string" means in the first place.

At the very least a string consists of a set [or class?] of symbols$^0$ with some kind of order, but I'm not sure where the line between "string" and "digraph" is drawn. If there are no restrictions, then the notion of "string" seems sufficiently general to be indistinguishable from "binary relation."

Edit:

To give something of a starting point, I had originally thought to define a general "closure" operation that gives the set of all strings up to a certain (possibly infinite) length.

Let $\gamma$ be some fixed ordinal, then...

$$Cl_\gamma(X)=\bigcup_{i<\gamma}X^i$$

This readily generates the set of $\omega$-length strings when $\gamma=\omega+1$. However, the closure described above excludes strings which are not "well ordered." For example, strings with no leftmost symbol, which can be generated by infintary left-recursive grammars. Even extending the above to count reverse-order types for each ordinal$^1$ excludes the case of "cyclic" and "dense" strings (provided that such things are permitted).


$^0$ when referring to the characters of a string as a set [class?], assume that each character is identified with a pair $(k,s)$, where $s$ is the symbol and $k$ is a "tag" distinguishing different instances of $s$. If you take this one step further, then a string is a function from some set [class?] $X$ to a set $S$ of primitive symbols, plus a binary relation [order?] on $X$.

$^1$ $X^{\omega^*}$, appropriately defined, would account for $\omega$-length "left-recursive" strings

2

There are 2 best solutions below

2
On BEST ANSWER

I prefer to use the term word instead of string. Successive notions of words over an alphabet $A$ have been defined and studied: finite words, infinite words, bi-infinite words, words over ordinals and words over linear orderings. See [1] for a survey. Cyclic words were also considered: they are equivalence classes of cyclic permutations of finite words.

[1] Véronique Bruyère and Olivier Carton, Automata on linear orderings, J. Comput. System Sci. 73 (2007), no. 1, 1--24. (See also here).

0
On

$\require{enclose}$ $\require{cancel}$

The following mathematical definitions are consistent with how some mathematicians use strings today.

As an opening example, we can write the string $“\mathtt{normal distribution}”$ in many different ways.

The following is a poor approximation of normal distribution in German (Deutsch):

Normalverteilung

The Unicode Character Set $\, \mathbb{U}$

Although we wanted a definition for string, we must first define an alphabet or co-domain. This might seem like computer programming, but please bare with me.

We define $\mathbb{U}$ to be the set of all Unicode v 15.0 characters.

We have $149186 = \begin{vmatrix}\mathbb{U}\end{vmatrix}$ or there are $149,186$ characters.

$94$ of the Unicode characters known as the ASCII printable set are shown below:

!"#$%&'()*+,-./0123
456789:;<=>?@ABCDEFG
HIJKLMNOPQRSTUVWXYZ[
]^_`abcdefghijklmnop
qrstuvwxyz{|}~

Formal Definitions of Unicode String


Fuzzy North American Representation of the Definition of Unicode String

A string is function whose inputs are numbers and the outputs are Unicode characters such that indexing begins at $1$, and we have wrap-around indexing.

Detailed North American Representation of the Definition of Unicode String

For any mapping $\sigma$ from the zalen integers $\mathbb{Z}$ without zero to the set of all Unicode character $\mathbb{U}$ we say that $\sigma$ is a string

if and only if

There exists $n$ element of the zalen without zero such that:

if

the absolute value of $n$ is less than or equal to the absolute value of $m$
then
for any $k$ element of the set of all zalen without zero,

all of the following:

if

$-|n|$ is less than or equal to $k$
and
$k$ is less than or equal to the absolute value of $n$
then
$\sigma(k)$ not equal to $\mathtt{null}$

if

$k$ strictly less than $-|n|$
or
$k$ strictly greater than $+|n|$
then
$\sigma(k) = \mathtt{null}$

Additionally, $\forall k \in \mathbb{Z} \setminus \{0\} \enclose{horizontalstrike}{\sigma(-|k|) = \sigma(+|k|)}$


Edit

We should have written $\sigma(n-|k|) = \sigma(|k|)$ where $n$ is the minimum non-negative zalen integer from $\mathbb{Z}$ such that $\sigma(n-|k|) = \mathtt{null}$



Germanische Darstellung der Definition von Unicode-Strings

Für jede Abbildung $\sigma$ aus den Zahlen ohne Null auf die Menge aller Unicode-Zeichen $\mathbb{U}$ sagen wir, dass $\sigma$ ein String ist

wenn und nur wenn

Es $n$ Elementr der Zahlen ohne Null gibt, so dass:

wenn

der absolute Wert von $n$ kleiner oder gleich dem absoluten Wert von $m$ ist dann gilt für jedes $k$-Element der Menge aller Zahlen ohne Null,

alles Folgende:

wenn

$-|n|$ kleiner oder gleich $k$ ist und $k$ kleiner oder gleich dem absoluten Wert von $n$ ist dann ist $\sigma(k)$ ungleich $\mathtt{null}$

wenn

$k$ streng kleiner als $-|n|$ oder $k$ strikt größer als $+|n|$ ist dann ist $\sigma(k) = \mathtt{null}$


Mostly Symbolic Representation of the Definition of Unicode String

$\forall \sigma \subseteq (\mathbb{Z} \setminus \{0\}) \times (\mathbb{U}) \in $ the set of all finite-length strings

$\iff$

$\forall z, z^{\prime} \in \mathbb{Z} \setminus \{0\} \land \forall \mathtt{ch}, \mathtt{ch}^{\prime} \in \mathbb{U} (\sigma(z) = \sigma(z^{\prime})) \implies (\mathtt{ch} = \mathtt{ch}^{\prime})$

$\land$

$\exists n \in \mathbb{Z} \setminus \{0\} \backepsilon$

$|n| \leq |m|$
$\implies$
$\forall k \in \mathbb{Z} \setminus \{0\}$,

all of the following:

$-|n| \leq k \leq +|n|$
$\implies$
$\sigma(k) \neq \mathtt{null}$

$k < -|n|$ or $k > +|n|$
$\implies$
$\sigma(k) = \mathtt{null}$


Notation for Strings

In addition to the so-called ASCII printables shown above, do not forget the following:

Name of Character Interface Implementation
tabs $\mathtt{“\backslash t”}$ $9$
line-feeds $\mathtt{“\backslash n”}$ $10$
carriage returns $“\mathtt{\backslash r}”$ $13$
back-slash characters. $“\mathtt{\backslash\backslash}”$ $13$
null $\mathtt{null}$ $92$

In order to avoid trouble distiguishing between the number zero and the leftmost letter $\mathtt{“O”}$ in $\mathtt{“Oxford”}$, we write $\mathtt{null} = \mathtt{0}$.

Quotation marks have meaning. Consider the following set:

$$ A = \{ \mathtt{“\backslash t”}, \mathtt{“\backslash n”,} “\mathtt{\backslash r}”, “\mathtt{\backslash\backslash}”, \mathtt{null}, “\mathtt{null}” \} $$

  • The string $“\mathtt{null}”$ using quotation marks is

    1. the letter $“\mathtt{n}”$
    2. the letter $“\mathtt{u}”$
    3. the letter $“\mathtt{l}”$
    4. the letter $“\mathtt{l}”$
  • $\mathtt{null}$ without quotation marks is the number zero.

As an additional example, the following contains a tab-character, not a number 9

$(\mathtt{“normal”}, 9, \mathtt{“distribution”})$


North American Definition of Subsequence

For any two strings $\sigma$ and $\Sigma$, $\sigma$ is a sub-sequence of $\Sigma$ if and only if for every $z$ element of the zalen integers there exists $K$ element of the zalen integers such that $k$ is less than or equal to $K$ and $\sigma(k)$ is $\Sigma(K)$


Deutsche Definition einer Untersequenz

Für zwei beliebige Zeichenketten $\sigma$ und $\Sigma$ ist $\sigma$ genau dann eine Untersequenz von $\Sigma$, wenn für jedes $z$-Element der Menge der Ganzzahlen ein $K$-Element der Menge der Ganzzahlen existiert, so dass $k$ kleiner oder gleich $K$ ist und $\sigma(k)$ $\Sigma(K)$ ist


Mostly symbolic Definition of Subsequence

For any two strings $\sigma$ and $\Sigma$, $\sigma$ is a sub-sequence of $\Sigma \iff \forall z \in \mathbb{Z} \setminus \{0\} \exists K \in \mathbb{Z} \setminus \{0\}: k \leq K \land \sigma(k) = \Sigma(K)$