Why are the hierarchy theorem proofs called diagonalization?

Question

Why are the hierarchy theorem proofs called diagonalization?

309 Views Asked by Bumbble Comm At 06 Apr 2026 - 10:32

Proofs of the various hierarchy theorems in theoretical computer science (see e.g. http://www.cs.princeton.edu/theory/complexity/diagchap.pdf) are usually called diagonalization proofs. Why they are given this name?

When I look e.g. at the proof of the time hierarchy theorem, I can't see any connection to Cantor's diagonal argument, two-dimensional matrices or to diagonals in general. So where do diagonals come into play?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 31 Jan 2014 - 8:24

One of the proofs in the linked document proposes to "simulate the execution of $M_x$ on $x$".

The repetition of the "$x$" is why it's called "diagonal".

Consider $$ \begin{array}{cccc} \text{Execute $M_1$ on $1$.} & \text{Execute $M_2$ on $1$.} & \text{Execute $M_3$ on $1$.} & \cdots\cdots \\ \text{Execute $M_1$ on $2$.} & \text{Execute $M_2$ on $2$.} & \text{Execute $M_3$ on $2$.} & \cdots\cdots \\ \text{Execute $M_1$ on $3$.} & \text{Execute $M_2$ on $3$.} & \text{Execute $M_3$ on $3$.} & \cdots\cdots \\ \vdots & \vdots & \vdots & \ddots \end{array} $$ Abbreviated: $$ \begin{array}{cccc} (1,1) & (2,1) & (3,1) & \cdots \\ (1,2) & (2,2) & (3,2) & \cdots \\ (1,3) & (2,3) & (3,3) & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{array} $$ The proof is telling you to do the things on the diagonal of this array.

Bumbble Comm On 31 Jan 2014 - 8:53

The idea does in fact comes from Cantor's Diagonal Argument. Recall how that argument works, if $f_n$ is a sequence of binary sequences then defining $f(n)=1-f_n(n)$ ensures that for every $n$, $f\neq f_n$.

Now given a class of functions (of any sort) with a common domain, if we can enumerate them using their domain, then defining $f(x)=f_x(x)+[\ldots]$ assures us that $f$ is not in that class of functions (where $\ldots$ is some reasonable and permitted operation, like adding $1$ or so).

Let me quote the second paragraph of Section $3.5$:

For concreteness, let us say that "diagonalization" is any technique that relies upon the following properties of Turing machines:

The existence of an effective representations of Turing machines by strings.

The ability of one TM to simulate any other without much overhead in running time or space.

This really just says that there is a way to encode Turing machines into strings of integers, and equally a natural number, and there is a Turing machine which takes an input of two numbers, decodes (if possible) the first one into a Turing machine, and simulate that machine when given the second number as an input.

So in fact we can think of Turing machines as natural numbers, and therefore we can index and input Turing machines to other Turing machines. So the above idea of diagonalization works. (Of course, since this topic is more about complexity of runtime you have to worry about the overheads, but as the second point in the excerpt says, that this is not a real worry.)

**Bumbble Comm** · Accepted Answer

In a typical proof of this sort, you are trying to construct a function $g(n)$ which is not equal to any function in some given sequence $f_i(n)$, $i = 0, 1, 2, \ldots$. In principle, to do that, you only need to show:

(*) For each $i$ there is a $n_i$ with $g(n_i) \not = f_i(n_i)$

Proofs of this kind are often called "diagonalization" informally by computability theorists; we have a very broad outlook on what "diagonalization" means.

Now, since you are constructing $g$, you have at least some control over it. To make the proof of (*) easier, you might go out of your way to make sure you know what $n_i$ will be, in terms of $i$; that makes the rest of the proof more direct. The simplest possible option is to construct $g$ in such a way that you can use $n_i = i$. That is the most "pure" kind of diagonalization.

If it is not straightforward to guarnatee $n_i = i$, you may at least be able to construct $g$ so that it is easy to pick some $n_i$ for each $i$. That is what is being done in the time hierarchy theorem proof linked in the question. Essentially, they need $n_i$ to be a sufficiently large index for a machine that computes the function $f_i$.

Why are the hierarchy theorem proofs called diagonalization?

There are 3 best solutions below

Related Questions in COMPUTER-SCIENCE

Related Questions in COMPUTATIONAL-COMPLEXITY

Trending Questions

Popular # Hahtags

Popular Questions