Unambiguous set abstraction (set-builder) notation with parameters

1k Views Asked by At

I know two common variants of set abstraction notation. An examples of the first variant is $$ \{\,(x, y)\in\mathbf{R}^2\,|\,x^2 + y^2 = 1\,\} $$ (it is reminiscent of the axiom schema of specification of ZF), and an example of the second is $$ \{\,(\cos t,\sin t)\,|\,t\in\mathbf{R}\,\} $$ (it is reminiscent of the axiom schema of replacement of ZF). I think that in common mathematical tradition, these two "set-builder" expressions unambiguously denote the same set.

The first form of the notation can always be used instead of the second, for example: $$ \{\,(\cos t,\sin t)\,|\,t\in\mathbf{R}\,\} = \{\,(x, y)\in\mathbf{R}^2\,|\,(\exists t\in\mathbf{R})(x =\cos t\wedge y =\sin t)\,\}. $$ However, the "translated" expression is noticeably more verbose.

Sometimes the two are combined: $$ \{\,(\cos t,\sin t)\,|\,t\in\mathbf{R}\wedge 0 < t <\pi\,\}, $$ or simply $$ \{\,(\cos t,\sin t)\,|\,0 < t <\pi\,\}. $$

The first form can be translated into the combined form without much "overhead": $$ \{\,(x, y)\in\mathbf{R}^2\,|\,x^2 + y^2 = 1\,\} = \{\,(x, y)\,|\,x^2 + y^2 = 1\wedge (x, y)\in\mathbf{R}^2\,\}. $$ The second form is just a particular case of this combined form.

In practice, the combined form of "set-builder" notation is quite convenient, but it appears to be ambiguous, unless some artificial syntactic conventions are made. For example, what set $S$ is defined in the following sentence?

Let $p = 1$ and $S = \{\,(p + q, p - q)\,|\,pq = 1\wedge p\in\mathbf{R}\wedge q\in\mathbf{R}\wedge 1 < 2\,\}$.

I see two equally legitimate values for $S$ so defined:

  1. $S =\{(2, 0)\}$ or

  2. $S =\{\,(x, y)\in\mathbf{R}^2\,|\,x^2 - y^2 = 4\,\}$.

Once you think of it, there is also a possibility for $S$ to be empty, if both $p$ and $q$ are viewed as parameters and if the value of $q$ happens to be different from $1$.

Does there exist an unambiguous form of the combined variant of "set-builder" notation?

In general, I do not see how to specify in an expression like $$ \{\,\{\,f(\bar a,\bar b,\bar c)\,|\,\Phi(\bar a,\bar b,\bar c)\,\}\,|\,\Psi(\bar a,\bar b,\bar c)\,\} $$ that, for example, $\bar c$ are parameters, and $\bar b$ are parameters in the inner "set-builder" but not in the outer one. (Note that though all the variables are listed in parentheses each time, it does not mean that all of them are "used": $f$, $\Phi$ and $\Psi$ may very well not depend on some of these "formal parameters.")

For reference, here is how some programming languages deal with a similar issue in list comprehension. More precisely, in programming languages there is no issue, because iterating over a collection and testing a predicate are two different operations in programming, while they are indistinguishable in mathematics.

In Haskell,

[ (m, n) | m <- [0..5], 0 <= n, n <= 5, m^2 + n^2 == 25 ]

is the list of all pairs $(m, n)$, where $m$ is an integer from $0$ to $5$, $0\le n\le 5$, and $m^2 + n^2 = 25$. Note that this very sentence in English is ambiguous! The above expression in Haskell is only valid if $n$ is defined. The value of this expression is [(3,4)] if the value of $n$ is 4, but it is [] (the empty list) if the value of $n$ is 2, for example.

A similar definition in Python will be:

[ (m, n) for m in range(0, 6) if 0 <= n and n <= 5 and m**2 + n**2 == 25 ]

However, the analogous "set-builder" notation $$ \{\,(m, n)\,|\,m\in\{0,\dotsc,5\}\wedge 0\le n\le 5 \wedge m^2 + n^2 = 25\,\} $$ is ambiguous (are "$m$" and "$n$" parameters or bound variables of this expression?).

Here is another example of an ambiguous situation: $$ \{\,\{\,x\,|\,x\in\mathbf{Z}\}\,|\,x\in\mathbf{R}\,\}. $$

For my personal needs, I start considering using my own notation $\{\ldots|\ldots|\ldots\}$ with bound variables listed in the middle, with the following formal "translation": $$ S =\{\,f(\bar x, \bar y)\,|\,\bar x\,|\,\Phi(\bar x, \bar y)\,\}\\ \equiv (\forall\xi)(\xi\in S\Leftrightarrow(\exists\bar x)(\xi = f(\bar x, \bar y)\wedge\Phi(\bar x, \bar y))), $$ where the new variable $\xi$ is distinct from all variables in $\bar x$ and $\bar y$. For example: $$ A =\{\,(\cos t,\sin t)\,|\,t\,|\,t\in\mathbf{R}\,\},\\ B =\{\,(x, y)\,|\,x, y\,|\,x^2 + y^2 = 1\wedge (x, y)\in\mathbf{R}^2\,\},\\ C =\{\,(m, n)\,|\,m\,|\,m\in\{0,\dotsc,5\}\wedge 0\le n\le 5 \wedge m^2 + n^2 = 25\,\},\\ D =\{\,\{\,f(\bar a,\bar b,\bar c)\,|\,\bar a\,|\,\Phi(\bar a,\bar b,\bar c)\,\}\,|\,\bar a,\bar b\,|\,\Psi(\bar a,\bar b,\bar c)\,\},\\ E =\{\,\{\,x\,|\,|\,x\in\mathbf{Z}\,\}\,|\,x\,|\,x\in\mathbf{R}\,\}\quad (=\{\,\{x\}\,|\,x\,|\,x\in\mathbf{Z}\,\}\cup\{\{\}\}), $$ or, alternatively, with words instead of symbols: $$ A =\{\,(\cos t,\sin t)\ \text{for}\ t\ \text{such that}\ t\in\mathbf{R}\,\}. $$ Isn't there some kind of a standard alternative?

It seems that the Z notation might be an example of what I am looking for, but at a first glance it looks incompatible with the usual mathematical "set-builder" syntax, and I wonder how standard it is.


Some comments and answers suggest to never use the same variable name for different variables, at least when their scopes are nested, thus avoiding variable shadowing altogether. IMO, such a restriction is unusual (surely uncommon in programming or formal languages), unnecessary, not easy to observe, and it would require explicit introduction of all parameters (consider omitting "Let $p = 1$" in my example above while still using "$p$" as a parameter).

Quoting the HoTT Book (page 23),

Of course, this should all be familiar to any mathematician: it is the same phenomenon as the fact that if $f(x):\equiv\int_1^2\frac{dt}{x - t}$, then $f(t)$ is not $\int_1^2\frac{dt}{t - t}$, but rather $\int_1^2\frac{ds}{t - s}$.

4

There are 4 best solutions below

11
On

As I tried to point out in the comments there is no other notation in use, because the notations we have aren't really particularly ambiguous. The example you make would be parsed by pretty much everyone as $S=\{(2,0)\}$ unless the rest of the paper makes this untenable.

There is a simple reason for that and that is that it's very bad form to rescope variables in math. By rescope here I mean we don't write $\forall x\exists y\forall x\cdots$. This is actually well defined as I tried to point out by the example of $\forall x(\exists y(\forall x (x\geq y)\wedge (xy=y)))$. This means the same as $\forall x\exists y\forall z (z\geq y)\wedge (xy=y)$ which is something that's true for natural numbers with 0.

It is very hard to read things like that though and very bad form to write them. Which means that in a given theorem, lemma, etc. the scope of variable and constant names is obvious and they are not reused in strange ways. The ambiguity thus doesn't really arise.

3
On

An interesting point about set-builder notation is that it is used as part of informal set theory. Formal set theories such as ZFC do not include set-builder notation, and instead use the comprehension scheme which defines a set using a formula of set theory. Set builder notation is, in some sense, just an abbreviation for the comprehension scheme. It's not meant to be a formal language or a programming language, just a way to convey the definition of a set to another mathematician.

This means that set-builder notation is more of a "natural language" than a formal language. Issues such a free vs. bound variables, or variables vs. parameters, are handled by convention, just like many other issues in ordinary mathematics. So, just as in other areas of ordinary mathematics, it is possible to write vague definitions, or definitions that can be read in multiple ways. Mathematicians avoid that hazard in natural language all the time, not only with set builder notation.

10
On

Here is my summary of the two notations at play here. \begin{array}{} \{x\in S\mid\psi(x,t_1,t_2,\dots,t_n)\} & & \text{The subset of $S$ satisfying property $\psi$}\\ \{f(x,t_1,\dots,t_n)\mid x\in S\} & & \text{The image of $S$ under $f$} \end{array} In both examples, the $t_i$ are variables which could be defined elsewhere, and every definition gives rise to a different set. However, the variable $x$ plays a special role; it is used as a placeholder to specify a generic element of $S$.

The above two notations are unambiguous. The ambiguity comes when we mix the two notations. $$ \{f(x,t_1,\dots,t_n)\mid x\in S,\psi(x,s_1,\dots,s_m)\}\qquad \text{The image of the subset of $S$ satisfying $\psi$ under $f$.} $$ The problem is that the statement $\psi$ might contain a clause like "$t_i\in T$," so it makes $t_i$ look like the "special" placeholder variable, and it is unclear whether $x$ or $t_i$ or both is supposed to fulfill this role. Your example was $$ \{f(p,q)\mid p\in \mathbb R,q\in \mathbb R,pq=1,1<2\} $$ where $f(p,q)=(p+q,p-q)$. Here, it is unclear whether $p$ and $q$ are both supposed to range freely over $\mathbb{R}$, or $p$ is supposed to be a fixed value while $q$ ranges over $\mathbb R$, or vice versa.

There is a simple way to avoid this ambiguity:

Assume that any variable $x$ which appears on both sides of the divider $\mid$, and which appears on the right side in the form of a clause $x\in S$ for some set $S$, is intended only to be interpreted of the context of this notation as a variable freely ranging over $S$.

Under this rule, the set $S$ you asked about would mean the second interpretation. I realize this is the opposite of what I strongly said in my comment, but I think there is enough precedent in math for having "locally" bound variables which do not interfere with other instances. For example, you could write "$i=5$ and $n=\sum_{i=1}^{5} i$", and most people would agree that $n=15$, that is, $i$ is not equal to $5$ at each point in the summation.

0
On

You can avoid the ambiguity in standard set-builder notation by falling back to "single-variable comprehension" whenever the LHS contains variables that are not bound in the set-builder expression. The downside is that the paraphrase in the presence of free variables makes the expression longer and requires some explicit existential quantifiers.

First, note that using a single variable as the term to the left of the bar can express every set comprehension. For example $\{ z \mathop| \exists x \exists y \mathop. z = (x, y) \land (x, y) \in \mathbb{R}^2 \land x^2 + y^2 = 1 \}$ instead of $\{ (x, y) \in \mathbb{R}^2 \mathop| x^2 + y^2 = 1 \}$. You can explicitly say that when you use this notation that variable appearing in $z$'s position is always bound and never free.

This fallback option is not convenient, but it lets you reserve the abbreviated version for cases where all variable symbols in the term to the left of the bar are bound.

For example, in $\{ (p+q, p-q) \mathop| pq=1 \land p \in \mathbb{R} \land q \in \mathbb{R} \}$, both $p$ and $q$ would unambiguously be bound.

To make $p$ free, you would use $\{ z \mathop| \exists q \mathop. z = (p, q) \land p \in \mathbb{R} \land q \in \mathbb{R}\} $.

I think this convention is simple and standard enough that it could be used in writing without explaining it, which would not be true for a variant of the comprehension notation with an explicit way of annotating bound variables.