Abusing mathematical notation, are these examples of abuse?

14.5k Views Asked by At

I have often seen notation like this:

Let $f:\mathbb{R}^2 \to \mathbb{R}$ be defined by $$f(x,y)=x^2+83xy+y^7$$

How does this make any sense? If the domain is $\mathbb{R}^2$ then $f$ should be mapping individual tuples. Therefore I expect a proper representation of the above function would be:

$$f(t)=\pi_1(t)^2+83\pi_1(t)\pi_2(t)+\pi_2(t)^7$$

Isn't this more accurate? If the domain is $\mathbb{R}\times \mathbb{R}=\mathbb{R}^2$ and $\mathbb{R}\times \mathbb{R}=\{(x,y):(x\in \mathbb{R})\land (y\in \mathbb{R})\}$ then every element in the domain is a two-tuple $(x,y)$ not some irregular expression composed of two variables seperated by a comma like "$x,y$" right?

Also when speaking of algebraic structures why do people constantly interchange the carrier set with the algebraic structure itself. For example you might see someone write this:

Given any field $\mathbb{F}$ take those elements in our field $a\in \mathbb{F}$ that satisfy the equation $a^8=a$.

How does this make any sense? If $\mathbb{F}$ is a field then it is a tuple equipped with two binary operations and corresponding identity elements all of which satisfy a variety of axioms. Thus we should have for some set $S$ that $\mathbb{F}=(S,+,\times,0,1)$ so they would be writing $a\in (S,+,\times,0,1)$ which is gibberish, they should write $a\in S$. Again I see the field $\mathbb{F}$ and its underlying set $S$ are being interchanged, I see this across almost all areas of abstract algebra with monoids, groups, rings etc.

16

There are 16 best solutions below

2
On

How we write the function is completely irrelevant to whether it's a function or not.

The real question: given any $(x,y)\in \Bbb R^2$ is there a unique $z\in \Bbb R$ such that $f(x,y)=z$?

If so, $f$ is a function. How we write $f$ is irrelevant.

Also why do people refer to the set of group elements as the group itself, isn't a group the set and its operator?

Laziness. Everyone knows what you mean so why write more than you have to?

10
On

The example you've given of a function is not an abuse. $x$ is instead shorthand for $\pi_1(t)$ and $y$ is shorthand for $\pi_2(t)$ and $(x,y)$ is shorthand for $t$.

$g \in G$ is a very minor abuse, yes. "A group $G$ is a set $G$ endowed with some operations" is a slight abuse, but one which will never be misinterpreted. It is done this way to avoid the proliferation of unnecessary and confusing symbols. For the same reason, we use the symbol $+$ to refer to the three different operations of addition of integers, rationals, and reals.

6
On

A function $f:\>X\to Y$ takes elements $x\in X$ as input and produces for each such $x$ an output value $y\in Y$. In so far any function is "unary".

Now it so happens that in many cases the domain $X$ is a cartesian product $X=R^n$, such that we need $n$ pieces of $R$-data in order to specify a single point $x\in X$, i.e., we write $x=(x_1,x_2,\ldots, x_n)$. The function value $f(x)$ should thus be noted as $f((x_1,x_2,\ldots, x_n))$, but in daily practice one strips one set of parentheses off. Note that in the Mathematica language one writes $f\bigl[\{x_1,\ldots, x_n\}\bigr]$.

Note that in many cases such a "multivariable" function depends on $x$ not in a "cloudy" way. Instead this dependence is clearly structured along the factorization of $X$, as in $f(x,y,z):=(x^2+y^2)e^z$. The "variables" $x$, $y$, $z$ then obtain their own personality and are not just tools to address the intended domain point ${\bf p}$.

Unfortunately nobody could tell me in all those years what a "variable" is in analysis.

1
On

We usually do not distinguish between what we call a field (or a group) and what we call its underlying set, rather we associate the two. For example, I would never write $\mathbb{F} = (S, +, \times)$, if I needed to give this notation I would say I am dealing with a field $\mathbb{F} = (\mathbb{F}, +, \times)$. This fits with the notion that a field is a set equipped with these operations.

As for the first question, $f$ is a function that takes elements of $\mathbb{R}^{2}$ as inputs. Using $f(t)$ is awkward in giving the formula, since it needs to refer to the subcomponents. I would personally write $f((x,y))$ instead of $f(x,y)$, but I could see the argument that the second looks nicer. It's not an abuse of notation; in either case $f$ is taking a pair of elements from $\mathbb{R}$ as input.

9
On

Suppose that in the two situations we cited, the entire mathematical community were to agree overnight to switch notation. What would we gain, besides the smug satisfaction of pedantry? The new notation would not be any more effective at communicating the underlying ideas; in fact, if anything, it will serve to distract from them and make a greater barrier to entry when someone is trying to learn the material for the first time.

I'd argue that notation should be judged by three standards: (1) its clarity, (2) its rigor, and (3) whether it somehow invites the reader to make a mistake. On metric (1), I strongly prefer the current "abuses" of notation, particularly when considering this from the perspective of someone who doesn't already understand it. On metric (2), I guess you could give a tiny edge to the alternatives you propose... but not really. If, for instance, you were trying to encode these concepts into a programming language, it would be trivial to fill in the details from the existing notation. And (3) is a wash; there is no meaningful opportunity for error created by the notation in either case.

In short: Notation is purely a construct, and to worship at the altar of pedantry at the cost of clarity (and brevity, which is a legitimate facet of clarity) is misguided.

EDIT: I fear that this may have been more a rant than an answer to the question. So, to be sure that it's both: no, these examples should not be regarded as an abuse of notation. Rather, they're useful and harmless conventions, like all good notation is.

EDIT #2: After reflecting on GitGud's comment, I think they are right; it's hard for me to say with a straight face that (particularly the second example in the original post) is not an abuse of notation. Really, Patrick Stevens already gave the best answer to the original question, and I'm glad it has been upvoted as many times as it has. However, these abuses of notation should, if anything, be regarded as (1) very mild abuses, and (2) useful, perhaps even important, conventions.

5
On

The notation is there to help us explain and communicate what are abstract thoughts in our heads. As such, if people understand what we are thinking/saying, it is close enough. And, being exact would only make things longer and more confusing.

I agree that the actual example you give isn't an abuse: we all know that a mapping from $R^2$ to $R^1$ has two 'variables', which are the 'coordinates in $R^2$, or if you like, or the elements of the tuple in $R^1xR^1$; So, if $z=x^2+xy+y^2$, we all know what we mean.

I suspect part of your concern comes from computer science. There, tuples must be explicit and track for software to work: in Python, a tuple is declared like $a=(1,2,3)$. But even at modern AI sophistication, software can't infer stuff. Technically this is because all software requires what is called an $LR1$ grammar (which means it is sufficiently unambiguous), which guarantees there is zero ambiguity. As a consequence, computer languages often have exact but confusing syntax because they slavishly follow grammar/notation rules with zero abuse: in Python, the following also creates a tuple of length 1: $$a=1,$$ which looks, for all the world, wrong.

I often think that how you know that you understand a concept when the notation is 'abused' (or perhaps confused), and symbols get repeated - but have different meanings you see immediately For example, if you were dealing with basis vectors over multiple dimensions that were indexed, and also exponentiating, An expression like this looks perfectly sensible: $$z=\sum e_i^ {e^{{\frac{ix^2}{\sigma^2}}}}$$

even if it isn't clearly sensible, given the context and our proper understanding of what is going on, we might all know the $e$ downstairs is a basis vector (though a hat over it would help), and that the $e$ upstairs is 2.718, and - given the context - that the $i$ upstairs is likely = $\sqrt{-1}$, and not the index $i$ from the $e$ at the bottom. There is ample possibility for confusion by someone who doesn't grasp the math, but those who get it can untangle the repeated symbols (the $e$'s and $i$'s in my example).

A sort of extreme version of this is the Einstein notation convention, where you drop the $\sum$ and just 'intuit', based on the logic, that it is to be summed over all necessary subscripts...so $x=a_j x_ij b_j$ has implied summation - but we don't need to say it.

There are endless examples. We say that a space X is measurable, but really that requires a tuple like $(X, \mathbf X, \mu)$, which gives the underlying space, the $\sigma$-algebra, and the measure. But, if we know the $\sigma$-algebra then we could figure out the space; and, really, if we know the measure and topology, we can figure the $\sigma$-algebra. So it would be more accurate to talk about the measurable space $\mu$, but we all 'know what we mean'.

Abuse of notation? Sometimes, I suppose. But notation needn't be slavish, and should make it simple and clear - and we can 'fill in the blanks'. that is what still separate humans from machines.

0
On

How does this make any sense?

Is your issue that it's f(x,y) rather than f((x,y)), or that it's expressed in terms of x and y instead of projections? For the first, it's just a simplification of notation. For the second, I don't see how you think it doesn't make sense. Clearly, f(x,y) = ... is a claim that can be either true or false. That is, it's a meaningful claim. So for the question "Does this make sense?", the answer is clearly "yes". The only remotely reasonable question I see is "Is this a valid definition of the function?", and for that question, the answer is still "Yes". A function definition is any statement (implicitly quantified over all possible values of the dummy variables) that uniquely defines a function. So, for instance, given a normal subset H and element g0, f(gH) = gg0H is a valid definition, because given any coset, we can express it as gH for some g, and the end result will be the same regardless of what g we take. If, instead, we had f(gH) = g2H, that would not be a valid definition, because we will get different outputs depending on what g we take. Similarly, suppose we have a function f: V->V over a vector space V. If we say f([a,b]) = [2a,2b], that's a well-defined function, because even though it's a coordinate-dependent definition, the end result is invariant under change of basis. On the other hand, f([a,b]) = [a2,0] would not be a valid definition without specifying a basis system. You seem to be hung up on thinking of function in CS-type manner, where a function has to be "passed" the parameters, and then "operate" on them, and then "return" an output. In math, function are a much more abstract concept. They are simply a relation between two sets: for every element in the domain, there is some element in the codomain that is somehow associated with that element. It's easier to think of that association as being the output of some operation, but it can be anything.

Your second example is more valid. We do use space and space+structure interchangeably when it does not introduce ambiguity. We can speak of "the integers" as a set, or as a group under addition, or as a ring. If we want to specify how we're using the integers, we might say "The integers as a group under addition" (or sometimes just "integers as a group"), rather than "The integers as the space over which a group is defined with addition as its binary operation". There are various canonical meanings of phrases, and further specification is generally not needed unless the phrases are being used in non-canonical means not obvious from the context.

0
On

Some of your examples are bad examples. In notation, there are always some conventions, like:

  • $f(x,y) = f((x,y))$
  • Given an arbitrary field, unless specified, its additive identity is denoted $0$ and its multiplicative identity is denoted $1$; the multiplication is denoted $\times$ or simply nothing, the addition is denoted $+$, the subtraction and division by $-$ and $/$, respectively. Integer powers are defined uniquely through $a^1=a$, $a^{-1}=1/a$ and $a^{m+n}=a^ma^n$.
  • Element of an algebra (field, group, ring, whatever) is an element of the underlying set.

Also, note that $\pi_1$ and $\pi_2$ are much less standard than $x$, $y$, for arguments to functions from $\mathbb{R}^2$. You would have to carefully define these two functions.

These are standard conventions. Of course, if you need to break them, you have to be very specific about it, and also, if confusion could arise, you shall be specific (though many people won't).

As an example of how things get broke, look at the tropical semirings. Here one has to be very careful. If $R$ is the max-plus algebra, we have $0_R = -\infty$ and $1_R = 0_{\mathbb{{R}}}$ (you see the confusion?) Even more confusion could arise from writing $1$ without a subscript, as some (probably algebraic) people will thing it's $1_R$ and others will consider it $1_{\mathbb{R}}$. But then the problem is that the underlying set for the ring $R$ is the field $\mathbb{R}$ with one element added, but the operations are not the same.

1
On

Let reverse the question.

Given a function $f(x,y)=x+y$, what is its domain?

Most people would say $\mathbb R^2$. What would you say?

1
On

If we are going to play the game as you want, your writing has serious issues. For starters, $\mathbb R$ is a set, but then you are using this symbol $+$ which you haven't defined. You also have the symbol $83xy$, which is not clear what it means. It looks like you are abusing notation and writing multiplication of real numbers (that you never said you were using) by juxtaposition; so, is the expression $83xy$ equal to $8\times3\times x\times y$? Maybe not, but in that case you are giving two different meanings to juxtaposition of real numbers, without saying so.

Also, you have those symbols $\pi_1,\pi_2$ that you haven't defined. It looks by your notation that they are functions, but in that case you should express the domain, the codomain, and the rule.

1
On

I am struck by the implication in this question and in many of the answers that abuse of notation is something to be avoided. In fact is is not uncommon in mathematical writing for authors to say they are expressing something in a certain way "by abuse of notation," and the authors are not actually ashamed of what they are doing. I would agree that your second case is exactly an example of this sort of thing, but conflating the underlying set of an algebraic structure with the structure itself has been a part of mathematical notation since well before the expression "abuse of notation" was coined.

0
On

A function needs to assign a unique element in the range for every element in the domain. It doesn't need to be able to be written as a "formula" of the input. So even though x and y can be written as formulas of the tuple (x,y), this does not mean that doing so would be the correct way. The expression you wrote unambiguously assigns every element in the domain some unique element in the range, and so there is no abuse of notation.

0
On

Here is a POV from a telecommunication engineer (SW dev, research assistant, now teacher), and engineers are used to notation abuse :-).

I like to remind my students that the primary goal for math notation is correct, unambiguous communication between humans.

Humans are not perfect language parsers, and get confused easily. If notation obscures semantics, you risk not getting too far in your understanding.

As far as notation is understandable unambiguously in the context where you use it, that's ok.

Writing a computer program in some artificial language is another story, but artificial languages evolution has shown that going higher level (a language a human can write more easily, with less pedantry) pays off in terms of productivity (i.e., higher quality code, more robust, written in less time and needing less maintenance).

Making a machine understand "abuses of notation", sometimes, makes the job easier and more robust for humans (as long as the semantics stays clear), even if it means that the machine (be it real hardware, or be it virtual, as an interpreter program) will work harder to get the job done.

3
On

Writing $f(x,y)$ for $f((x,y))$ is just a convention.

Note that in the usual set theoretic construction of functions as relations, the notation $f(x)$ is also just a convention for the set $y$ for which $(x,y)\in f$. I think you also accepted the convention $Ax$ for $A(x)$ in the case of linear maps $A$ to make it look more like a (matrix-)multiplication.

Notational conventions are everywhere and make it possible to do deep mathematics without getting lost in the jungle of first order predicate logic formulas and formal proofs.

0
On

The existing answers to this question are very good, but I'd like to give another perspective.

It turns out that there is a formalism in which the notation

$$ f (x, y) = x^2 + 83xy + y^7 $$

makes perfect sense and is neither a 'convention' nor an 'abuse of notation'.

Among type theorists, it is common to write a multivariate function not as

$$ f \colon A \times B \to C $$

but as

$$ f \colon A \to (B \to C) $$

where $X\to Y$ is the set of functions from $X$ to $Y$. Here, we read '$f$ is a function from $A$ to the set of functions from $B$ to $C$'. For example, we have: \begin{gather} f \colon \mathbb R \to (\mathbb R \to \mathbb R)\\ f(x)(y) = x^2 + 83xy+y^7 \end{gather}

Note that we do not have to do any work defining a set of pairs to make this work. If we want to recover the usual way, though, we can do it. Given sets $A,B$, we define $A\times B$ to be the 'universal' set $X$ such that there is a function $$ A \to (B \to X) $$ What I mean by 'universal' is a little complicated. Formally, the map $$ X\mapsto A \to (B \to X) $$ defines a functor from the category of sets to itself. The defining property of $A\times B$ is that it is a representing object for this functor; i.e., we have a natural isomorphism $$ \Phi\colon A \times B \to \_ \Rightarrow A \to (B \to \_) $$ In particular, we can define the pairing function \begin{gather} \_,\_\colon A \to (B \to (A\times B))\\ \_,\_ = \Phi(A\times B)(\text{id}_{A\times B}) \end{gather}

Note that we write $x,y$ rather than $\_,\_(x)(y)$, but this is no less natural than writing $x+y$ instead of $\_+\_(x)(y)$.


Let's see how your example is constructed in this formalism. You were concerned about the definition $$ f(x, y) = x^2 + 83xy + y^7 $$ where $f$ is a function $\mathbb R\times \mathbb R \to \mathbb R$.

In our formalism, $(x, y)$ means the function $\_,\_$ applied first to $x$ and then to $y$. If we write the function $\_,\_$ out in full, then we have: $$ (f\circ (\_,\_))(x)(y) = x^2 + 83xy + y^7 $$

Now consider the commutative square for a natural isomorphism: $$ \require{AMScd} \begin{CD} \mathbb R \times \mathbb R\to \mathbb R\times \mathbb R @>{\Phi}>> \mathbb R \to (\mathbb R \to \mathbb R\times\mathbb R);\\ @V{f\circ\_}VV @VV{f\circ\_}V \\ \mathbb R \times \mathbb R \to \mathbb R\times\mathbb R @<{\Phi^{-1}}<< \mathbb R \to (\mathbb R \to \mathbb R\times\mathbb R); \end{CD} $$ where we have reversed the arrow at the bottom since $\Phi$ is an isomorphism. Consider what happens when we apply this to the element $\text{id}_{\mathbb R\times\mathbb R}\colon\mathbb R\times \mathbb R\to\mathbb R\times\mathbb R$. We get: $$ f = \Phi^{-1}(f\circ(\_,\_)) $$ In other words, in order to define $f$, it suffices to define the composite $f\circ(\_,\_)$, and that is exactly what we are doing when we write $$ f(x, y) = x^2 + 83xy + y^7 $$ So even though we're defining not $f$ but its composite with the pairing function, we are still defining $f$ itself because it can be completely recovered from this composition via the natural isomorphism $\Phi^{-1}$. It's a bit like defining a quotient group:

We define $xH * yH$ to be $(x * y) H$

Proposition: this multiplication is well defined...

except that in this case the well-definedness proof can be taken for granted since we use it all the time.

0
On

I think I have seen the following definition in some textbook, which takes care of the first issue:

Definition: If $A$, $B$, and $C$ are sets, $f: A\times B \rightarrow C$ is a function, and $a\in A$, $b\in B$, then $f(a,b)=_\textrm{df}f((a,b))$.

The mathematical content of this definition is to define a bivariate function as a special case as a univariate function. This is not an obvious equivocation, and I imagine that the concepts of univariate and multivariate function were around before they were subsumed into the single modern set-theoretic definition of function (is this true?). If this definition is accepted, then your example is not an abuse of notation.

The second example is an example of metonymy, which occurs all over mathematics. Metonymy is a linguistic device that is not specific to mathematics; it occurs in all languages. Here is a discussion of its use in mathematics, including your example. Yes, this is an abuse of notation, but the term "abuse of notation" in mathematical English does not carry the negative connotation that "abuse" does in non-mathematical English.

Halmos's Naive Set Theory also has a passage relevant to this second example (p. 55):

A partially ordered set is a set together with a partial order in it. A precise formulation of this "togetherness" goes as follows: a partially ordered set is an ordered pair $(X, \sim)$, where $X$ is a set and $\sim$ is a partial order in $X$. This kind of definition is very common in mathematics; a mathematical structure is almost always a set "together" with some specified other sets, functions, and relations. The accepted way of making such definitions precise is by reference to ordered pairs, triples, or whatever is appropriate. That is not the only way. Observe, for instance, that knowledge of a partial order implies knowledge of its domain. If, therefore, we describe a partially ordered set as an ordered pair, we are being quite redundant; the second coordinate alone would have conveyed the same amount of information. In matters of language and notation, however, tradition always conquers pure reason. The accepted mathematical behavior (for structures in general, illustrated here for partially ordered sets) is to admit that ordered pairs are the right approach, to forget that the second coordinate is the important one, and to speak as if the first coordinate were all that mattered. Following custom, we shall often say something like "let $X$ be a partially ordered set," when what we really mean is "let $X$ be the domain of a partial order." The same linguistic conventions apply to totally ordered sets, i.e., to partially ordered sets whose order is in fact total.