The potential of the comprehensive formalization of mathematics has fascinated me for quite some time - even though I am a working mathematician whose research is not part of the area conventionally known as “formal methods”. Last week, I took the opportunity to attend the QED+20 workshop which was part of the Vienna Summer of Logic and which celebrates anniversary of the original QED workshops and manifesto.

20 years ago, the QED project set out to create “a computer system that effectively represents all important mathematical knowledge and techniques”. Today, as even a cursory glance at the field will reveal, we are still a long way from achieving that goal. Nevertheless, as the excellent talks at QED+20 showed, there has been a tremendous amount of progress. I also enjoyed the workshop because the illustrious range of speakers at the event gave a wonderful overview of the many different perspectives of the researchers in the area.

However, for most of the workshop, I was struck by an odd incongruence between what researchers in the formalization of mathematics are doing and what I, as a working mathematician, would be interested in.

## The potential of QED

From my perspective, there are many benefits that a wide-spread machine-processable formalization of mathematics could have. These potential benefits include:

- Ascertaining correctness of mathematical theorems.
- Revolutionizing mathematical publishing.
- Automation of large-scale case analyses and computations in mathematical proofs.
- Semantic search of mathematical methods and results.
- Computerized discovery of new mathematical ideas.

The incongruence that I perceived is this:

To me, correctness is the most boring and least important of these benefits. Yet, it appears to be the one thing that the QED community focuses on almost exclusively.

By contrast, the other items are all very interesting and appear crucial to the advancement of mathematics. However, with some exceptions they do not seem to be what current research in QED is all about. Let me explain what I mean by that by addressing these items one by one.

### Correctness

Mathematical research has two different modes: On the one hand, there is the “hot phase” of research (as Bruno Buchberger likes to call it) where you creatively explore new ideas to gain insight into the problem. On the other hand, there is the “cold phase” where you iron out the details and make your ideas precise in a semi-formal way. Research requires both of these processes. And yet, it is the hot phase that excites me and where actual discovery takes place. The cold phase, on the other hand, is due diligence – hard boring work that is necessary.

Now, just as John Harrison mentioned at one point, it would be fantastic if QED could make it *easier* for me to get the cold phase research over and done with. But instead, QED appears to be all about turning cold phase research into an art form all by itself. Beautiful ideas are ground out to a level of detail, formality and technicality that not only appears excessive and unintelligible (even to mathematicians) but that – as several talks at the workshop made evident – turn the project of formalizing a major proof into an epic undertaking, requiring quite literally man-decades of work.

The main outcome from these huge formalization projects is that a given proof is certified to be correct. For proofs of long-standing conjectures – such as Thomas Hales’ proof of the Kepler conjecture – this certification is worthwhile. But in my every-day work as a mathematician, correctness is not an issue. The interesting thing about a proof of a theorem in a research paper is *not* that it provides a certificate of correctness for a theorem. A proof is interesting when it conveys an *intuitive idea* for why a theorem should hold and when it reveals a *new method* for solving a particular type of problem. Whether the authors got all the details right is beside the point. The interesting question is whether the key idea works, and for this the human semi-formal process of mathematical reasoning suffices. Moreover, to paraphrase an argument of Michael Kohlhase, even if a proof idea turns out to be wrong: Either the idea is irrelevant anyway and then it does not matter that it is wrong. Or the idea is interesting, and then people will look at it, work with it and figure out how to get it right by way of a social process.

### Publishing

The fact that formalism, technicalities and correctness are secondary becomes particularly evident when writing math papers. From my own experience I know the more technical a paper is, the less it gets looked at. By contrast, papers that convey the key ideas clearly get read, even if that entails a very informal presentation of theorems and proofs. This makes writing math papers a frustrating process, as one has to strike a fine balance between between technical correctness and conveying the intuitive ideas – often even within a single sentence. Here, innovation in mathematical publishing could be a huge help. Separating the concerns of conveying intuitive ideas and their technical implementation – possibly by going beyond the static PDF as the delivery format of mathematical articles – would make life a lot easier for authors, readers and computer alike.

### Computations in Proofs

Another area where computers can contribute a lot to mathematical research is carrying out large-scale case analyses and large-scale computations as part of a mathematical proof. Both the proof of the Kepler conjecture and the 4-color theorem are prime examples of this, but they are far from the only ones. Researchers at RISC routinely do such computer proofs in many different areas of mathematics. However, even as far as doing computer proofs is concerned, there seems to be a mismatch between the interests of mathematicians and those of the formal methods community. On the one hand, Thomas Hales reported at QED+20 that a major challenge in formalizing the proof of the Kepler conjecture was getting the “human proof” and “computer proof” parts to play well with each other within the formalization. Apparently, ITP systems are not tailored towards executing formally verified algorithms efficiently and accepting their computation as part of a proof. On the other hand, the mathematical community interested in computer proofs is happy to accept the computations of computer algebra systems or custom-written code as proof, even though none of these systems are formally verified! So while there certainly is interest in computer proofs, I see hardly any interest in *formally verified* computer proofs.

### Semantic Search

As I have written previously, I view the increasing fragmentation of mathematics as a huge challenge. More and more, researchers are unaware of results in other areas of mathematics that would be relevant to their own work. This hampers scientific progress, makes research more redundant and decreases the impact of new results. Here, an efficient semantic search over a comprehensive database of known results could be a tremendous help. If researchers could simply ask a mathematical question to an automated system and be pointed to the relevant literature – especially if the relevant literature is phrased in a vernacular of a different area of mathematics, which they may not be familiar with – research productivity would be drastically increased.

### New Ideas

Finally, there is of course the utopian hope that one days computers can discover genuinely new mathematical ideas that also provide new intuitive insight to human mathematicians. In a limited form, there are already isolated examples of such discoveries. But it seems that in general this utopian scenario is still far far away.

## Flexiformality

In view of this apparent mismatch between the focus of the QED community and my own interest in the subject, I was feeling somewhat out-of-place during most of the workshop. Thus, I was very happy with Michael Kohlhase’s talk towards the end of the day.

Kohlhase argued that when comparing the number of formalized theorems and the number of math research papers published, QED is losing the race by orders of magnitude (which reminded me of a I similar remark I made some time ago). He went on to say that correctness is over-rated (mirroring my sentiment exactly) and that to be able to bring the widespread formalization of mathematics about, we need to relax what we mean by “formal”. He argued that there is a whole spectrum of “flexiformal” mathematical articles, between the informal math paper and the formal proof document in use today. Moreover, he argued that based on flexiformal articles we can achieve a whole range of objectives, such as semantic mathematical search, as long as we do not focus on certifying correctness - which is a non-issue anyway.

These comments were not only very well placed - their were also particularly delightful to me, as I had spent the afternoon brainstorming and scribbling notes (under the vague headline “zero foundations” ) on what I would expect from a useful “flexiformal” mathematical proof document.

## What is a proof anyway?

I think, if we want to make any progress towards QED, then we have to radically rethink what we mean by a human-written mathematical proof document. Currently, a formal mathematical proof is taken to be a text written in a formal language with a precisely defined semantics which then is compiled – using a fixed ITP system with a deterministic behavior – to a low-level proof in terms of the axioms and inference rules of whatever logic the ITP is based on.

This, however, is not what a human-written proof in mathematical research paper is. A proof in a math paper has no well-defined semantics. It is not based on any clear-cut foundations of mathematics whatsoever. It is not “bound” to any fixed library of definitions and theorems or any fixed dictionary of terms. Notation and terminology is inconsistent. Citations are vague. But these are not bugs – these are features.

The purpose of a mathematical article is

notto prove a theorem, it is to convey an idea.

For this purpose, lack of any precise foundations is an essential feature of mathematical writing. The informality makes the writing far more concise and it makes the ideas stand out more clearly. It makes the content far more resilient to changes in the way definitions and theorems are stated and used in the contemporary literature (truly formal proofs “bit-rot” alarmingly fast). And it makes the content accessible to a wider range of readers from a variety of different backgrounds.

Despite all this ambiguity, I still think it is perfectly feasible to define what a mathematical article really is:

A mathematical article is an

advice stringintended to help the reader solve the in general undecidable problem of proving a theorem.

Most importantly, the obligation of proof lies with the *reader* of the article – no matter whether the reader is a human or a computer: It is up to the reader to pick the foundations on which they want to base the proof. It is up to the reader which background knowledge they are going to use. It is up to the reader to pick the theorem that they want to prove in the first place (which may well be a different theorem than the author of the article had in mind). And in the end it is also up to the reader to decide whether the ideas contained in the article will be at all useful for the task the reader has set themselves. In particular, the advice string is in no way assumed to be trustworthy – it does not certify anything.

This “advice string interpretation” of a mathematical article lies somewhere in the middle of the flexiformal spectrum. How such a *flexiformal proof sketch* might look like in detail I do not know, even though I have speculated about this before. The main objectives would be to produce an article format that is

- vastly easier for humans to read,
- vastly easier for humans to write,
- vastly more flexible in its relation to previous knowledge,
- significantly more versatile in its applicability to different problems,
- not necessarily endowed with a precise formal meaning,
- not necessarily correct, but still
- useful to computers as advice for producing proofs.

Of course such a format would be vastly more difficult for a machine to handle than today’s formal proofs. Which brings me to the last topic of the workshop.

## Artificial Intelligence

Fortunately, artificial intelligence is (finally) making huge progress in QED as well. Cezary Kaliszyk and Josef Urban have done amazing work integrating existing provers with machine learning methods for premise selection and turning the whole thing into a service that people can actually use. To mention just one highlight, the hybrid system they constructed can prove automatically almost all of the individual steps that appear in today’s large libraries of declarative proofs.

This is not quite as impressive as it looks at first glance: As I found in my own experiments with declarative proofs, a large part of the work involved in formalizing an existing mathematical proof goes into choosing the intermediate steps in just the right way. Nonetheless, it is already a huge help when systems can justify individual proof steps automatically.

Of course, we still have a long ways to go before such technology could automatically interpret flexiformal proof sketches in order to produce a fully formal proof acceptable to one of today’s ITP systems. And, given that I called into doubt the importance of formal correctness checks, the question arises why we should strive to have machines translate flexiformal proof sketches into low-level formal proofs at all. Perhaps Kohlhase is right, and we can achieve all the objectives that we care about without getting computers to reason at a more formal level than humans would.

However, most interesting applications, for example semantic mathematical search, arise when computers can in fact reason at a deep level about the mathematical content. Precisely because mathematicians in different areas phrase the same ideas in different language, the most interesting synergies will arise when computers can help us draw the non-obvious connections. Of course, nothing says that this reasoning has to be formal. But, in order to make sure that humans and computers do in fact talk about the same thing when reasoning about mathematics, a formal foundation certainly does seem like the most natural common ground.

## Conclusion

The QED community’s focus on correctness may well be a necessary first step that we need to take before we can attack the more exciting applications. However, I think the time is right to start talking about what else QED can do, especially if the QED community want to attract the attention of the wider mathematical community. I am glad that projects like MathHub.info are taking first steps in this direction, and place a particular focus on exploring new kinds of mathematical documents. No matter whether we start from an entirely formal level and build on top of that, or whether we start from an informal level and dig down: Getting working mathematicians interested in QED will require new kinds of flexiformal mathematical articles – in between the extremes of complete formal correctness and informal prose stuck in a PDF file.