Why is there not a system for computer checking mathematical proofs yet (2018)?

4.4k Views Asked by At

As of 2018, mathematical proofs are still being decided by human consensus. i.e. Give the proof to a few capable humans and if none of them can find any errors than they vote that the proof is correct and it can be published.

This surely is not a foolproof way of deciding mathematical truths.

One would think that there would by now be some standard way proofs could be written and computer checked for accuracy.

I read about a few computer checking systems a few years ago that hoped to do things like check Wile's proof of Fermat's last theorem or the Classification of Simple Groups Theorem. But not much more has been heard about them.

What has happened to this endeavour. Have mathematicians lost interest in having proofs computer checked?

It used to be that mathematicians would strive for formalisation of all mathematics. And yet it seems that they prefer the sloppy formalism of current mathematics which cannot be translated into computer language because much of it is ambiguous.

I would have thought that in 2018, every proof submitted to a mathematics journal would basically be written in a form that could be automatically proof checked. And conversely every unproved theorem written in a language that could be fed into a computer.

Even an amateur could then post a proof to the journal and with the press of a button could see if the proof was correct or garbage.

Do you know what the current state of this is?

8

There are 8 best solutions below

6
On BEST ANSWER

Have a look into the Flyspeck Project in which the proof of Kepler's sphere-packing conjecture was checked by computer precisely to avoid the problems that Martin Argerami denies in the human review process, but are significant in problems whose solution necessarily involves computation.

12
On

"One would think". No, why would "one think"?

What you claim is sloppy is not so. If a proof matters to enough people, the scrutiny is deep, and there are very few cases where wrong proofs go through. In fact, for results that are important enough, it is very common that people produce many different proofs, which lead to remove any possible doubt that the results is wrong.

More importanly, the main part of the peer reviewing process is to assess the significance of the results. An automated system publishing lots and lots of correct but useless math would be detrimental to the discipline.

4
On

They haven't been forgotten - These systems are still being actively researched and developed. I personally know someone who is working on the Lean theorem prover. In addition you can see Coq is being actively developed.

4
On

This is really a question calling for opinion so here's my opinion.

Mathematics is a human endeavor. Mathematicians work to understand some interesting and useful abstract notions. You can even start a philosophical discussion about whether mathematical objects "exist". Whether they are interesting or useful is clearly a matter of opinion.

We reason as carefully as we can to convince each other that our theorems are interesting and true (in some usually implicit but unspecified formal sense). The best proofs tell readers why a theorem is true, not just that it's true. It's that kind of understanding that leads to new mathematics.

The foundations on which we build our inferences are always under construction even as we build. What satisfied Archimedes or Euler or Cauchy wouldn't do today. Godel proved that Hilbert's hope that we could establish rigor once for all was vain. Sufficient unto the day is the rigor thereof.

Computer (assisted) proof is an interesting and active part of contemporary mathematics, but will never replace humans reasoning mathematically. How would you know (prove?) that the engine in your computer prover was properly programmed, and that you properly programmed the input you fed it? With another program ...?

3
On

I would suggest having a look at Freek Wiedijk's list tracking which of the "top 100 mathematical theorems" have so far been formalized in various theorem prover/theorem checker systems such as HOL, Isabelle, COQ, Mizar, Metamath, ProofPower, nqthm/ACL2, PVS, and NuPRL/MetaPRL.

http://www.cs.ru.nl/~freek/100/

And also "what might be the smallest proof checker" (500 lines of python):

https://en.wikipedia.org/wiki/Metamath#Proof_checkers

Those should give you hours (if not days) of reading material :-)

0
On

For the same reasons in Computer Science not all papers have a clear-cut implementation provided. There is more to research than the end result.

In Mathematics, human proofs take a lot of shortcuts. Those are usually small shortcuts, but if every single step of the way you had to explicit every single statement into an unambiguous equation, you would make your research unreadable and the process of creating new research would become really tedious. There are lot of Ph.D. that do so, for a big part of their research, but only once they have a clean "usual" proof on paper.

A proof of a theorem in a theorem prover is ideal, but a lot of work. The same can be said about Computer Science algorithms, where they are usually implemented into a barely working script that needs a lot of elbow grease to work. But because the idea behind works, and can be useful to build upon, it gets published, even though no final product can be shipped.

1
On

The practical value is not necessarily high enough to warrant the cost.

An automated theorem proving tool can help identify flawed algorithms, but it can't actually prove valid ones. The automated proving tool can generate a new proof which proves the old one is invalid (which may indeed be a simpler task), but it can't generate a proof that the old proof was correct.

The reason for this is the bootstrapping process. J. Brazile's answer provides an excellent context to describe this issue. In that answer, mmverify.py is referenced, which is a proof checker in 500 lines of Python. Surely it is trivial to prove that 500 lines of Python are correct, but how do we execute the python? The official implementation of Python, CPython, is over 100,000 lines of code. An error in one of them could cause the proof checker to erroneously produce a "valid" result to an invalid proof.

Likewise, CPython depends upon your operating system. Linux is hard to measure, but a reasonable estimate would be about 1.5 million lines of code. In theory, one bad line in that OS could cause bad results.

Indeed, we see this with L4, a Microkernel whose claim to fame is the fact that it has been verified as "correct" by Coq. However, their wording is very precise. The claim they make is that "If L4 is compiled with a standards conforming compiler, then it will behave as the documentation says it will." Compilers are hard to write. Tremendously hard.

And that doesn't even include issues like the infamous FDIV bug. There is absolutely no guarantee that the hardware implements things correctly. Granted they do an amazingly good job, but it's not mathematically provably perfect.

So, in the end, you can indeed improve the apparent validity of your proof by using a proof checking tool. However, the tool cannot make your proof correct. It may help during the process to help you find things that you miss, but once it's "right," then it still might not actually be right.

The question then becomes how valuable is this extra authority? It can't give you perfect authority, due to these issues, and there is a cost, as Nonyme mentioned in terms of readability. As it happens, it appears the mathematical community at large has not yet found that cost to be a reasonable one to expect from mathematicians.

0
On

Mathematical proofs are based completely on assumptions and assuming 'n' variables and taking diffrent constants in it. The problems with creating a system for computer checking mathematical proofs are:

Most written proofs have not been defined precisely and are derived from other proofs that we are assumed to understand. Creating a backlog for a proof will be a tedious process and everything is to stacked up. A proof takes certain conditions in every stage without considering the complete enviroment and takes everything part by part and not as a whole. Again, awaring the system of these sub-cases can be a tedious process. well, proof checkers are being made but most they can do is to validate the results derived in the proof by placing the conditions to the highest possible value of it. Having a System would be a good thing but we cannot validate the conditions itself are suitable enough or not for the theorems or the proofs on the grounds of which they are being tested. On a personal note, I think it completely destroys the idea of societies and groups formed for establishing the mathematical proofs and results. (Automation taking it's toll, eh.)