What are some counter-intuitive results in mathematics that involve only finite objects?

34.2k Views Asked by At

There are many counter-intuitive results in mathematics, some of which are listed here. However, most of these theorems involve infinite objects and one can argue that the reason these results seem counter-intuitive is our intuition not working properly for infinite objects.

I am looking for examples of counter-intuitive theorems which involve only finite objects. Let me be clear about what I mean by "involving finite objects". The objects involved in the proposed examples should not contain an infinite amount of information. For example, a singleton consisting of a real number is a finite object, however, a real number simply encodes a sequence of natural numbers and hence contains an infinite amount of information. Thus the proposed examples should not mention any real numbers.

I would prefer to have statements which do not mention infinite sets at all. An example of such a counter-intuitive theorem would be the existence of non-transitive dice. On the other hand, allowing examples of the form $\forall n\ P(n)$ or $\exists n\ P(n)$ where $n$ ranges over some countable set and $P$ does not mention infinite sets would provide more flexibility to get nice answers.

What are some examples of such counter-intuitive theorems?

33

There are 33 best solutions below

24
On

Suppose $X$ is any finite set, which will represent a set of voters, and let $Y$ be another finite set, representing decisions or options that the voters can rank. For example, voting on presidential candidates, favorite ice cream, etc. For simplicity, assume that $X=\{1,\ldots, N\}$.

Call a ranking to be a linear ordering on $Y$, and a social welfare function is a map $$F: L(Y)^N \to L(Y)$$ where $L(Y)$ is the set of all linear orderings on $Y$. $F$ essentially shows how to take the rankings of each voter and turn that into a single ranking. The elements of $L(Y)^N$ are an $N$-tuple of rankings, a ranking of $Y$ from each voter. We shall represent such a tuple by $(\leq_n)_{n=1}^N$ and its image by $F((\leq_n)_{n=1}^N)=\leq$.


Since this is to be a voting system, we probably want this to adhere to some rules which enforce the idea that $F$ accurately reflects the rankings of each voter:

  • Unanimity: If every voter ranks $a\in Y$ better than $b\in Y$, then in the output of $F$, society ranks $a$ higher than $b$. Formally, if $a\leq_n b$ for every $n$, then $a\leq b$.

  • Independence of Irrelevant Alternatives: How voters rank $a$ and $b$ should not effect how society ranks $a$ and $c\neq b$. Formally, if $(\leq_n)_{n=1}^N$ and $(\leq_n')_{n=1}^N$ are two tuples of rankings such that the ordering of $a$ and $c$ are the same for each $n$ (i.e. $a\leq_n c$ if and only if $a\leq_n' c$) then the ordering of $a$ and $c$ are the same in society's rankings (i.e. $a \leq c$ if and only if $a\leq' c$).

    Since this is a bit more involved, consider the example of a group ranking the three ice cream flavors of vanilla, chocolate, and strawberry. The group makes their choices, and $F$ says that the highest ranked flavor is chocolate. Then the group learns that strawberry is out, so they rank strawberry as last. It would be counter-intuitive, then, to suspect that all of a sudden vanilla becomes ranked highest (but there are such functions making this true).

    The intuition is the hope that the group's consensus on how it feels about two options should only depend on how each individual feels about those two options.

    Cases where this still fails are usually indicative of cases where the voting scheme can be gamed in some way, i.e. by voting against your favorite option to avoid your least favorite option or by varying how you rank the remaining options to guarantee their loss.

    A good ranked voting system should be such that you benefit most by actually saying what you really think, than attempting to game the system. The failure of Independence of Irrelevant Alternatives allows for this gaming.


This bring's us to our result:

Arrow's Impossibility Theorem: For $Y$ finite and $|Y|> 2$, the only function $F: L(Y)^N \to L(Y)$ satisfying the above two properties is a dictatorship, i.e. there is a fixed $m$ (which depends only on $F$) such that $1\leq m\leq N$ and $F((\leq_n)_{n=1}^N) = \leq_m$.

One method of proof is by considering filters and using the fact that the only ultrafilters on a finite set are the principal ultrafilters.

It is important to note that Arrow's Impossibility Theorem only applies to ranking systems. There are alternatives ways of voting which are not ranking systems and show more promise.

Moreover, whether the hypothesis of the Independence of Irrelevant Alternatives actually captures what we want in all cases is suspect.

9
On

100 prisoners problem.

Citing Sedgewick and Flajolet, the problem reads as follows:

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner's number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards. If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers. What is the prisoners' best strategy?

Surprisingly, there exists a strategy with surviving probability more than 30%. It is connected to the fact---also non-intuitive---that a big random permutation is quite likely to contain "small" cycles only.

16
On

I've always thought Bernoulli's paradox was counter-intuitive, and the name suggests that I'm not the only one who thinks this. (Nicolas) Bernoulli offers the following gambling game; You flip an honest coin. If it comes up heads, you win two dollars and get to flip again. If it comes up heads a 2nd time, you win four more dollars and you get to flip again. In general if it comes up heads for n consecutive times, on the nth flip you win 2^n more dollars and get to flip again. The first time the coin comes up tails the game is over, but you get to keep your winnings. Bernoulli asked essentially what the mathematical expectation of this game is. To keep things finite, let's just ask if should you mortgage your house and pay Mr. Bernoulli $250,000 to play this game? Although I would strongly advise against doing so, the mathematics "shows" that you should be willing to put up all the money you have or can borrow to play the game.

12
On

What surprised me most was the following Arab story.

An inheritance of $35$ camels had to be distributed among $3$ brothers as follows: half for the elder, the third for the second and the ninth for the younger. The problem was that the brothers would then have to receive more than $17$ and less than $18$, more than $11$ and less than $12$, more than $3$ and less than $4$ respectively.

"The man who calculated" solved the problem by asking a friend to lend him a camel with which he obtained $36$ camels to distribute following the instructions of the deceased father. Then the brothers received $18$, $12$ and $4$ camels respectively, i.e. more than they had planned so they were very satisfied.

But then the “man who calculated" distributed $34$ camels so that in addition to returning the camel loaned by his friend had for himself a camel as a reward for his calculations.

It was several years after I read this story that I could explain mathematically what happened.

8
On

Monsky's theorem is easy enough to state for anyone to understand, yet arguably counter-intuitive.

Monsky's theorem states that it is not possible to dissect a square into an odd number of triangles of equal area. In other words, a square does not have an odd equidissection.

The problem was posed by Fred Richman in the American Mathematical Monthly in 1965, and was proved by Paul Monsky in 1970.

11
On

Hmm, not sure if this counts as counterintuitive, but what about the handshaking lemma, that the number of people at a party who shake hands an odd number of times is even?

13
On

Closed-form formulas exist for solutions of polynomials up to degree 4, but not more than that.

Only 4 colors are required to color a map of any size, with adjacent areas being distinct colors.

Division rings over the reals have a maximum of 4 dimensions, and cannot have more.

Having more than three regular convex polytypes is a property of dimensions only up to 4.

And more number-4 things by Saharon Shelah (presented in Twitter post by Artem Chernikov).

4
On

I think it is counterintuitive, at first glance, that list coloring graphs can be strictly harder than ordinary coloring.

To expand on what that means: a graph is a set of vertices, some pairs of which are adjacent. (Think of the vertices as dots, with adjacent vertices having a line drawn between them.) A proper $k$-coloring of a graph assigns each vertex a number from $1$ to $k$ such that no two adjacent vertices receive the same color. The chromatic number of a graph $G$, written $\chi(G)$, is the smallest $k$ such that $G$ has a proper $k$-coloring.

A $k$-list-assignment on a graph is a function $L$ that assigns each vertex some "palette" of $k$ different numbers. If $L$ is a $k$-list-assignment, then a proper $L$-coloring of $G$ assigns each vertex a color from its list such that, again, no two adjacent vertices receive the same color. The list chromatic number of $G$, written $\chi_l(G)$, is the smallest $k$ such that $G$ has a proper $L$-coloring for every $k$-list-assignment $L$.

Now, intuition (at least, my intuition) suggests that finding an $L$-coloring should be hardest when all the vertices have the same list $L$ -- in other words, when we're really just looking for a proper $k$-coloring. If different vertices have different lists, aren't there just fewer opportunities for collisions to happen? This suggests that maybe $\chi_l(G) = \chi(G)$ for every $G$, since the hardest list assignments "should" just be the ones that give every vertex the same list.

However, this isn't true! Consider the complete bipartite graph $K_{3,3}$, whose vertices consist of six vertices $v_1, v_2, v_3, w_1, w_2, w_3$ where any pair of vertices $v_iw_j$ is adjacent. This graph clearly has a proper $2$-coloring: color all the vertices $v_i$ with color $1$ and all the vertices $w_j$ with color $2$. But now consider the following list assignment $L$:

$L(v_1) = L(w_1) = \{1,2\}$

$L(v_2) = L(w_2) = \{1,3\}$

$L(v_3) = L(w_3) = \{2,3\}$.

Among the colors $\{1,2,3\}$ appearing in these lists, we would need to use at least two of them on the vertices $v_i$, since no color appears in all three lists, and we would need to use at least two of them on the vertices $w_j$, for the same reason. This means that some color gets used on both a $v_i$ and a $w_j$, which contradicts the assumption that the coloring is proper. So there is no proper $L$-coloring, which means $\chi_l(K_{3,3}) > 2 = \chi(K_{3,3})$. In other words, it's harder to color from these lists than it is to color when every vertex has the same list!

3
On

The amazing speed of exponential growth. This can be explained to children, but it will always defy your intuition, take e.g. the famous wheat and chessboard problem. As Addy Pross explains here, this may play a fundamental role in the emergence of life.

14
On

I feel like the Monty Hall problem is counter-intuitive the first time you see it.

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door (say No. 1). Then the host, who knows what's behind all the doors, opens another door (say No. 3) that is guaranteed to have a goat. He then says to you, "Do you want to now pick door No. 2?" Is it to your advantage to switch your choice?

The answer is yes, you should ALWAYS switch your choice. The reasoning is as follows: at the beginning you have a $1$ in $3$ chance of selecting the correct door. After the host shows you another door, there are only $2$ doors to select from there. You initially chose the first door which had a probability of $1/3$ of being the correct door. Now, because all the probabilities within a set of choices must always add to $1$, we can conclude that the 2nd door is correct with a probability of $2/3$. So indeed, switching your guess is to your advantage.

8
On

There's this game about paying taxes that I saw in Yale's open course on game theory. link

We have tax payers and a tax-collecting agency. The tax payers can choose whether to cheat paying $0$ or not cheat paying $a$. The agency can choose whether to check whether someone has paid tax. If someone cheats and the agency finds out, he'll have to pay $a$ to the agency and a fine $f$ that doesn't go to the agency. To check each tax payer, the agency has to spend $c$. $c$ is less than $a$. The payoff matrix is as follows:

                          Tax Payer
                      Cheat      Not Cheat
                 +-------------+-----------+
         Check   | (a-c, -a-f) | (a-c, -a) |
Agency           +-------------+-----------+
       Not Check |   (0, 0)    |  (a, -a)  |
                 +-------------+-----------+

One's intuition may suggest that as $f$ increases, tax payers will cheat less often. But solving for the Nash equilibrium tells us that in an equilibrium, as $f$ increases, the probability a tax payer cheats doesn't change, but the agency will check less often. Maybe the tax payer will cheat less often and the agency will check at the same probability when $f$ has just changed, but given enough time, rational tax payers and agency will play the Nash equilibrium.

The Nash equilibrium is that the tax payer cheats at a probability of $\frac c a$, and the agency checks at a probability of $\frac a {f+a}$. The tax payer's expected payoff is $-a$. The agency's expected payoff is $a-c$.

20
On

Here is a consequence of the Arc sine law for last visits. Let's assume playing with a fair coin.

Theorem (false) In a long coin-tossing game each player will be on the winning side for about half the time, and the lead will pass not infrequently from one player to the other.

The following text is from the classic An Introduction to Probability Theory and Its Applications, volume 1, by William Feller.

  • According to widespread beliefs a so-called law of averages should ensure the Theorem above. But, in fact this theorem is wrong and contrary to the usual belief the following holds:

    With probability $\frac{1}{2}$ no equalization occurred in the second half of the game regardless of the length of the game. Furthermore, the probabilities near the end point are greatest.

In fact this leads to the Arc sine law for last visits (see e.g. Vol 1, ch.3, section 4, Theorem 1).

Remarkable statements cited from Chapter III: Fluctuations in Coin Tossing and Random Walks:

  • For example, in various applications it is assumed, that observations on an individual coin-tossing game during a long time interval will yield the same statistical characteristics as the observation of the results of a huge number of independent games at one given instant. This is not so.

and later on:

  • Anyhow, it stands to reason that if even the simple coin-tossing game leads to paradoxical results that contradict our intuition, the latter cannot serve as a reliable guide in more complicated situations.

An example:

Suppose that a great many coin-tossing games are conducted simultaneously at the rate of one per second, day and night, for a whole year. On the average, in one out of ten games the last equalization will occur before $9$ days have passed, and the lead will not change during the following $356$ days. In one out of twenty cases the last equalization takes place within $2\frac{1}{4}$ days, and in one out of a hundred cases it occurs within the first $2$ hours and $10$ minutes.

10
On

Wedderburn's little theorem: In its simplest form, it states that any finite division ring is commutative. I find it borderline magical; as if somehow the axioms do not imply the result, but rather it is forced due to some weird combinatorial coincidence. Of course, this feeling of mine says more about my flawed intuition than about the theorem - but that's true for any example of a "counterintuitive mathematical result".

Another example, even more elementary, is the fact that a matrix's rank is well defined, i.e., that for any matrix (even if it's non-square), the dimension of its column-space is equal to the dimension of its row-space. Over time I came to terms with this result, but when I first encountered it, I've been reading the proof again and again, understanding every step - but still couldn't believe the theorem is true. It was a long time ago, but I clearly remember I've felt like I'm practicing voodoo.

8
On

Not sure whether this is the kind of thing you were expecting, but here goes:

Some statements about constructive mathematics can seem very counter-intuitive (at first, this is probably because one is misinterpreting what they mean), e.g.:

  • the induction principle holds, but on the other hand: that every non-empty (or inhabited) set of naturals has a smallest element is in general false
  • given a set $A$, consider the statements: (i) "there is a finite set $B$ and an injection $A\to B$", (ii) "there is a finite set $B$ and a surjection $B\to A$". None of the statements imply each other or that $A$ is finite
21
On

The hydra game. Quote from the link:

A hydra is a finite tree, with a root at the bottom. The object of the game is to cut down the hydra to its root. At each step, you can cut off one of the heads, after which the hydra grows new heads according to the following rules:

If you cut off a head growing out of the root, the hydra does not grow any new heads.

Suppose you cut off a head like this:

Delete the head and its neck. Descend down by 1 from the node at which the neck was attached. Look at the subtree growing from the connection through which you just descended. Pick a natural number, say 3, and grow that many copies of that subtree, like this:

The counter-intuitive fact: you can always kill the hydra using any algorithm. The counter-intuitive meta-fact: You can't prove the theorem in PA.

2
On

First, a more elementary example: Given two polygons $A$ and $B$ of equal area, I can always cut one into a finite number of polygonal pieces, and rearrange those to form the other; that is, any two polygons of equal area are scissors congruent.

Hilbert's third problem was about extending this result to three dimensions. He conjectured, rightly, that it fails in three dimensions, and this was proved shortly afterwards by his student, Dehn.

Still, even though it was proved rather quickly, and Hilbert's (and others') intuition was right on the mark, I still find it incredibly counterintuitive.


Another geometric example is the refutation of the triangulation conjecture; however, I'm not sure if arbitrary manifolds count as finite objects.

21
On

Stacking Books on a Table Edge

Given a rigid (non-deformable), flat, horizontal surface (e.g., a table), a rigid rectangular parallelepiped (e.g., a block, like a book or a brick) can be placed on the edge of the table so that $49.\overline9\%$ of its weight overhangs the edge:

       one book

Assume that you have a very large supply of identical books.  By moving the first book back toward the center of the table (i.e., away from the edge), you can put a second book on top of the first book so that $74.\overline9\%$ of its weight overhangs the edge:

       two books

By adding more and more books, you can get the top one completely beyond the edge of the table — in fact, as far beyond as you want:

six books

This is discussed (and illustrated much better) at Wolfram MathWorld.
Spoiler alert: it boils down to the fact that $$\sum_k\frac1k$$ grows without bound.

0
On

Determinacy of classes of finite games:

The existence of winning strategies for finite games without draws and with perfect information is counter-intuitive at first glance. It is, ultimately, very sound though.

I have been in the situation of explaining this to children and adults with the example of simple games, and most of the time people went from disbelief to recognition that it is in fact obvious.

There are three major reasons for this counter-intuitiveness:

-We are used to playing those types of games since we're children, and in our experience, our strategies often ended up defeated.

-Winning strategies for even simple games are often unknown: most of the time they can't be computed because the numbers of possible outcomes are two high. So most of the time, the existence of a winning strategy for a game doesn't affect the way you play the game.

-The definition of the fact $W$ of being in a winning position is somewhat complex: if you can make a winning move, then $W$, and (if whatever move your opponent makes, then $W$), then $W$. This type of recursive definition can be hard to grab, and when trying to unveil it, complexity, in the sense of an alternance of quantifiers, really appears since you get: $\forall$ opponent move $\exists$ move such that $\forall$ opponent move $\exists$ move such that ... such $\exists$ a winning move for you.

2
On

That you only need 23 people in a room for there to be a 50% chance that two of them will share a birthday.

5
On

The fallacy of the hot hand fallacy

Suppose you'd like to detect whether a coin ever goes on hot streaks or cold streaks, so that after $k$ heads in a row, the probability of the following flip coming up heads is different from the overall probability $p$ of a heads. To test this, you'll flip the coin $n$ times, and after any streak of $k$ heads, you'll record the outcome of the following flip. Let $X$ be the percentage of your recorded flips that came up heads. For concreteness, let's set the values at $p=\frac{1}{2}$, $n=100$, and $k=3$.

Here is the surprise: for these values, $E[X]\approx0.46$ (not $\frac{1}{2}$!!!!). And, in general for any $0<p<1$, $n\geq 3$, and $0<k<n$, $E[X]<p$, and the bias can be quite large for certain values of $n$ and $k$.

This is counterintuitive enough that when Gilovich, Vallone and Tversky wrote their seminal paper Hot Hand In Basketball in 1985 measuring whether basketball players went on "hot streaks", they used the exact method above to attempt to detect hot streaks, and since the percentage after three hits in a row was not different from the overall percentage, they concluded that there was no evidence of a hot hand. But this was a mistake! If there was no hot hand, they should have observed a significantly lower percentage on shots after three hits in a row. In fact, their data do show evidence for a hot hand in many of the cases, according to a new paper last month. This mistake went unchecked for 30 years, with untold numbers pop psychology books and articles citing the result as evidence for a "hot-hand fallacy".

Demonstration

Here's a demonstration in R.

f7 <- function(x){ 
  # running total of run length
  # stolen from http://tolstoy.newcastle.edu.au/R/e4/devel/08/04/1206.html
  tmp <- cumsum(x)
  tmp - cummax((!x)*tmp)
  }      

streak <- function(v, k = 3, n = length(v)) {
  # returns a vector of length n = length(v) this is TRUE when the last k
  # entires are True
  c(FALSE, f7(v)[1:(n-1)] >= k)
}

random_shots <- function(n, p = 0.5) {
# takes n random shots with probability p of success
  runif(n) < p
}

trial <- function(n, k = 3, p = 0.5) {
  s <- random_shots(n, p) 
  mean(s[streak(s, k)])
}

# do simulation 100000 times
results <- sapply(1:100000, function(x) trial(100, 3))
summary(results)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#  0.0000  0.3636  0.5000  0.4615  0.5714  0.8571       3 

What's going on?

One way to see it is to consider the absolute simplest case, where we flip the coin three times, and tally up the flips after a single head in a row.

Outcome   Heads   Flips   Proportion
HHH       2       2       1
HHT       1       2       1/2
HTH       0       1       0
HTT       0       1       0
THH       1       1       1
THT       0       1       0
TTH       0       0       NA
TTT       0       0       NA

The last two outcomes, of course, can't be included in our tally because the proportion of heads is undefined. Now, if we repeat this experiment many times, we'll find that of the sequences that we record, $\frac{2}{6}$ of the time we will have a proportion of $1$, while $\frac{1}{6}$ of the time we'll have $\frac{1}{2}$, for an expected proportion of $$\frac{2}{6}\times{1} + \frac{1}{6}\times\frac{1}{2} = \frac{5}{12} < \frac{1}{2}$$

So, by inspection we can plainly see that in this case the expected proportion is less than 0.5, although at first glance this might still seem unsatisfactory. Yeah, it's less than 0.5 but... why?

I think there are a few ways to hand-wave about this. One has to do with the fact that we have two outcomes with proportion = $1$, but one of those ways has two heads, and the other only has one. So sequences with more heads are weighted the same as sequences with fewer heads, and in this way heads are somehow being underrepresented, leading to the bias.

14
On

Simpson's Paradox.

Gender discrimination example. A university has only two graduate departments — Math and English. Men and women apply to both departments, with varying admit rates, as summarized in the table below.

Each department is more likely to admit women than men. But when aggregated, the university as a whole is more likely to admit men than women (and thus open to charges of gender discrimination)!

enter image description here

Drug testing example. We recycle the exact same numbers from before to a context in which the paradox seems even more paradoxical.

There are 300 male patients and 300 female patients suffering from some illness. 200 of the males and 100 of the females receive the new experimental drug. The recovery rates are as given in the table below.

The conclusion from this trial is that if we know the patient is male or female, we should not administer the drug. But absurdly, if we do not know whether the patient is male or female, we should!

enter image description here


For more, see Pearl (2014), who "safely proclaim[s] Simpson’s paradox 'resolved.'”

P.S. Simpson's Paradox illustrates both the fallacy of composition ("what is true of the parts must also be true of the whole") and the fallacy of division ("what is true of the whole must also be true of the parts").

6
On

As a non-mathematician it seems pretty obvious to me that there are whole number solutions (where $n>2$) for

$$a^{n}+b^{n}=c^{n}.$$

I'd be shocked if there weren't. There must be some, surely.

3
On

For those who don't find the "harmonic overhang" of Scott's answer counterintuitive, here is a variant that Loren Larson, a retired mathematician and current creator of wooden puzzles, discovered and demonstrated some years ago: It's possible to construct a stable tower of blocks with the property that removing the top block causes the tower below it to collapse.

It's worth spending some time trying to imagine how this is possible, but once you give up, here is the secret:

Start with a standard harmonic overhang (as I recall, Loren's had a couple dozen wooden blocks of dimensions something like $8$ inches by $2$ inches by $1/2$ inch) and carefully nudge it, block by block, to form a spiral. If the tower is tall enough and the spiral is big enough -- Loren worked out the mathematical details -- the top block becomes necessary to counterbalance the portion of the spiral lower down that juts out on the opposite side. Removing it creates an imbalance that causes the lower portion to topple.

2
On

A proper coloring of a graph is an assignment of colors to its vertices in such a way that no two adjacent vertices share the same color.

The chromatic number of a graph is the minimum number of colors for which there is a proper coloring.

Any tree has chromatic number $2$ - imagine starting at a leaf and just alternating red-blue-red-blue.

The girth of a graph is the number of vertices in its smallest cycle.

A tree has infinite girth, as it contains no cycles.

The tree example may cause you to think that large girth means small chromatic number. It seems plausible enough: A graph with large girth "looks like" a tree near any particular vertex, since it will be a long time before the edges leaving that vertex wrap back around to form a cycle. We therefore should be able to alternate red-blue-red-blue locally near a vertex, then just introduce a couple new colors to fix up the places where we get stuck.

Nothing of the sort! Erdős proved in 1959 using probabilistic techniques that there are graphs with arbitrarily high girth and chromatic number. In other words, that "treelike" appearance of high girth graphs has no ultimate control over chromatic number.

5
On

(1). In the Stone Age, Og and Mog each need to make large numbers of arrow-heads and ax-heads. Their products are of equal quality. Og takes 3 units of time to make an arrow-head and 5 units of time to make an ax-head. Mog takes 4 units of time to make an arrow-head and 7 units of time to make an ax-head. Both of them consider their time to be valuable.

Since Og is faster at both, it seems that Mog could not give Og an incentive to trade.

But Mog offers to give 17 arrow-heads for 10 ax-heads, which, for Mog, is trading 68 units of time in return for 70. And for Og, this is trading 50 units of time in return for 51. So they both benefit by the trade.

(2). This might not count as finite: In triangle ABC draw the trisectors of the interior angles. Let the trisector lines of angles B ,C that are closer to the side opposite A, meet at A'. Define B',C' similarly. Then (Morley's Theorem) A'B'C' is an equilateral triangle. Intuitively it may seem that A'B'C' might have any given shape. There is a nice proof in Introduction To Geometry by Coxeter.

(3). The only solution in positive integers to $X^3=Y^2+2$ is $X=3, Y=5.$ (Pierre de Fermat). Not obvious to me that there aren't any others.

2
On

Here are several involving the construction of regular polygons. For at least three out of the four cases we would not call it counterintuitive, but the results we know today would have thrown the Greeks for a loop.

1) The regular heptagon has no Euclidean construction. The Greeks would have preferred to be able to construct regular polygons generally by Euclid's methods, which would make their properties accessible to elementary proof. Only in modern times did we definitely learn the bad news.

2) The regular enneagon (nonagon) has no Euclidean construction, either. This is a special case of the old angle trisection problem: trisect the central angle of an equilateral triangle and you can make a regular enneagon. Now we know it works in reverse: we prove that a general angle trisection with Euclidean methods cannot exist by showing that the regular enneagon is not Euclidean-constructible.

3) The regular hendecagon does have a neusis construction. Unlike the other cases, this is purely a "modern" counterintuitive. It was long thought that neusis construction was similar to conic-section construction: you can solve cubic and quartic equations with it. But we know now that some (we don t know about all) irreducible quintic equations are also solvable. Benjamin and Snyder showed that the minimal equation for $\cos((2\pi)/11)$ is one such neusis-solvable equation. See Constructing the 11-gon by splitting an angle in five.

4) After the regular pentagon, the next regular prime-sided polygon constructible by Euclid's methods is the 17-gon. It 's "obvious" now, but ancient and medieval mathematicians would never have suspected it without the theories developed by Gauss.

3
On

82000. It absolutely boggles the mind because this is very close to the axioms, so to speak. You need to prove only a few things based on the Peano axioms to get to

The smallest number bigger than 1 whose base 2, 3, and 4 representations consist of zeros and ones is 4. If we ask the same question for bases up to 3, the answer is 3, and for bases up to 2, the answer is 2. But 82000 is the smallest integer bigger than 1 whose expressions in bases 2, 3, 4, and 5 all consist entirely of zeros and ones.

And it is very likely there is no other for 2-5 and none at all for 2-6.

Edit: if you downvote, please leave a comment so I can learn. Thanks!

7
On

I don't know whether it qualifies -

If we cut a Mobius strip, instead of getting two strips, we get only one strip (longer, with two 2 twists). When I first saw this, it was quite counter-intuitive to me.

0
On

Maybe not right on the money, but worth mentioning: (subtly) faulty dissections. For example, as shown here, it is seemingly possible to dissect an 8 x 8 square into a 5 x 13 rectangle.

1
On

Langton's ant is a set of rules that are applied to change "pixels" in a grid. After a finite number of steps of applying those rules, the rather chaotic behaviour changes into a periodic one. That can be seen in the picture below.

langtons ant

5
On

Pick a couple of positive integers $a,b$ and make a tower of exponents with $a$, and find the value $\bmod b$. For example:

$21^{21^{21^{21}}} \bmod 31$

Now you don't need a very tall tower before extending it makes absolutely no difference to the result. I can replace the top $21$ there with any other positive integer, or (for effect) pile on more layers of exponents, and the answer doesn't change.

7
On

Recall the Busy Beaver function, $BB(n)$ is the maximal number of steps that a Turing machine with at most $n$ states will halt on the blank tape, assuming it halts at all.

$\sf ZFC$ cannot decide the value of $BB(1919)$.1

Namely, if there is a contradiction to $\sf ZFC$, then a Turing machine with less than $2000$ states should be able to find it. Yes, $1919$ is a large number, but it's not unimaginably large. But what it means is that $BB(1919)$ is pretty much entirely unimaginable, because we cannot even give it a concrete estimation.

(See this and that on Scott Aaronson's blog.)


1. Under the usual caveat that we need to assume that $\sf ZFC$ is consistent of course.

3
On

One of the best (and most useful) is Benford's law which states that the leading digit of a randomly chosen number is not equally distributed among the radix. For example, in based 10, the value 9 will appear less often than any other digit in the highest place value position - in a "naturally occurring" population of numbers.

Say you wanted to analyse the accounts of a company to identify falsified data. If you found that some set of numbers (e.g. a set of invoices) contained the digits $0-9$ in roughly equal proportions in the highest place value, this would be a red flag that the data might have been falsely generated.

This is because place value grows logarithmically while a number's value is a linear function of the choice of any given digit, making low-valued digits such as $1$ more likely to lead.