Felix Breuer's Blog

Formal proof - first steps with HOL Light

Recently, I have philosophized about how the mathematical community needs to move beyond (new) theorems as their currency of research. One different form of currency that I personally find particularly interesting are formal proofs. So, in the last few weeks I have spent some time getting my feet wet with one of the formal proof systems out there: HOL Light.

To learn HOL light, I decided to pick a simple theorem (from the theory of formal power series) and try to formalize it, both in a procedural and in a declarative style. In this blog post, I document what I have learned during this exercise.

My point of view is that of an everyday mathematician, who would like to use formal proof systems to produce formal proofs to go along with his everyday research papers. I realize of course, that this is not practical yet, given the current state of formal proof systems. But my goal is to get an idea of just how far away this dream is from reality.

During my experiments the hol-info mailing list was a huge help. In particular, I would like to thank Freek Wiedijk, Petros Papapanagiotou and John Harrison.

I will begin by giving a short summary of the informal proof and its two formal versions. This should give you a general idea of what all of this is about. If you want more details, you can then refer to the other parts of this post. Here is an outline:

So, without further ado, let’s start with a summary.

Summary: One Theorem and Three Proofs

The theorem that I want to prove formally is simply that every formal power series with non-zero constant term has a reciprocal - and this reciprocal has a straightforward recursive definition. In everyday, informal mathematical language, this theorem and its proof might be stated like this.

Theorem. Let $f=\sum_{n\geq 0} a_n x^n$ be a formal power series with $a_0\not = 0$. Then, the series $g=\sum_{n\geq 0} b_n x^n$ with \begin{eqnarray*} b_0 & = & \frac{1}{a_0} \cr b_n & = & -\frac{1}{a_0}\sum_{k = 1}^n a_k b_{n-k} \; \forall n\geq 1 \end{eqnarray*} is a reciprocal of $f$, i.e., $f\cdot g = 1$.

Proof. To see that $g$ is a reciprocal of $f$, we have to show that \[ f\cdot g = \sum_{n \geq 0} \sum_{k=0}^n a_k b_{n-k} = 1 \] which means that for all $n\in\mathbb{Z}_{\geq 0}$ we have \[ \sum_{k=0}^n a_k b_{n-k} = \left\{ \begin{array}{ll} 1 & \text{if } n=0 \cr 0 & \text{if } n\geq 1 \end{array} \right. \] To show this, we proceed by induction. If $n=0$, then we have \[ \sum_{k=0}^n a_k b_{n-k} = a_0 b_0 = a_0 \frac{1}{a_0} = 1 \] as desired. For the induction step, let $n\in\mathbb{Z}_{\geq 0}$ and assume that the identity holds for $n$. We are now going to show that it also holds for $n+1$. To this end we calculate \begin{eqnarray*} \sum_{k=0}^{n+1} a_k b_{n+1-k} & = & a_0 b_{n+1} + \sum_{k=1}^{n+1} a_k b_{n+1-k} \ &=& a_0 \left( -\frac{1}{a_0}\sum_{k = 1}^{n+1} a_k b_{n+1-k} \right) + \sum_{k=1}^{n+1} a_k b_{n+1-k} \ &=& 0. \end{eqnarray*} End of Proof.

This proof is very simple and straightforward. Therefore, many authors would try to be more terse. For example, in Wilf’s classic text book with the wonderful title generatingfunctionology this theorem is Proposition 2.1 and its proof does not even state the above calculation explicitly. Personally, I think the above proof is just about the right length.

You may have noticed that in the above proof, we did not use the induction hypothesis in the induction step. Indeed, the above proof can be done just using case analysis, without induction at all. I will nonetheless stick with the induction version of the proof, because it is more idiomatic when formalized in HOL Light.

Now, let’s see how the above proof looks when rendered in a formal proof language. In fact, we are going to do this proof in two formal proof languages: a procedural and a declarative one.

We first need to make some definitions for working with generating functions (aka formal power series). I chose to model a generating function $f$ simply as a function $f:\mathbb{N}\rightarrow\mathbb{R}$. So, if $f$ corresponds to $\sum_{n\geq 0} a_n x^n$ in the power series notation, then $f(n)=a_n$. With this approach, addition, multiplication and various other things can be defined as follows. See the section on Formal Definitions below for details.

1
2
3
4
5
let gf_plus = new_definition `gf_plus = \(f:num->real) (g:num->real) (n:num). (f n) + (g n) :real`;; 
let gf_times = new_definition `gf_times = \(f:num->real) (g:num->real) (n:num). (sum (0 .. n) (\k. (f k)*(g (n-k)))) :real`;;
let gf_one = new_definition `gf_one = \(n:num). if n = 0 then &1 else &0`;;
let gf_reciprocal = new_definition `gf_reciprocal = \(f:num->real) (g:num->real). gf_times f g = gf_one`;;
let recip = define `recip f (0:num) =  ((&1:real)/(f 0) :real) /\  recip f (n+1) = -- (&1/(f 0)) * sum (1..(n+1)) (\k. (f k) * recip f ((n + 1) - k))`;;

With these definitions, the theorem we are trying to prove is this:

1
`!(f:num->real). ~(f (0:num) = (&0:real)) ==> gf_reciprocal f (recip f)`

Note how the recursive formula for the reciprocal of $f$ has been wrapped in the definition of recip. The relation gf_reciprocal is used abbreviate the statement that recip f is a reciprocal of f.

Now we are ready to give the two formal versions of our above proof of this theorem. We start with the procedural version. It looks like this:

1
2
3
4
5
6
7
let GF_RECIP_EXPLICIT_FORM = prove (`!(f:num->real). ~(f (0:num) = (&0:real)) ==> gf_reciprocal f (recip f)`,
  GEN_TAC THEN DISCH_TAC THEN REWRITE_TAC[gf_reciprocal;gf_times;gf_one] THEN REWRITE_TAC[FUN_EQ_THM] 
  THEN INDUCT_TAC THEN ASM_SIMP_TAC[NUMSEG_SING;SUM_SING] THEN CONV_TAC NUM_REDUCE_CONV 
  THEN REWRITE_TAC[recip] THEN SIMP_TAC[REAL_ARITH `x * &1 / x = x / x`] THEN ASM_SIMP_TAC[REAL_DIV_REFL] 
  THEN REWRITE_TAC[MATCH_MP (SPEC `(\k. f k * recip f (x - k))` SUM_CLAUSES_LEFT) (SPEC_ALL LE_0);SUB_0;ADD_0;ADD_AC;ADD1;recip] 
  THEN (FIRST_ASSUM (ASSUME_TAC o (MATCH_MP (REAL_FIELD `!x. ~(x = &0) ==> !a. x * -- (&1 / x) * a = -- a`)))) 
  THEN ASM_REWRITE_TAC[] THEN SIMP_TAC[REAL_FIELD `--a + a = &0`] THEN ASM_SIMP_TAC[ARITH_RULE `~(x + 1 = 0)`]);;

I will explain this proof in detail in the section A Procedural Proof below. For now, the important thing to note here is simply this: Presented in this way, the procedural proof is completely unintelligible. To make sense of procedural proofs, they have to be run interactively. I will walk you through such an interactive run of this proof below and try to explain what is going on in the process. The gist of it is this: Procedural proofs like this one consist of a sequence of proof tactics. Tactics are essentially functions like REWRITE_TAC that are written in OCaml and that produce a part of the proof as their output. They may take pre-proved theorems like FUN_EQ_THM as their arguments.

This way of writing proofs is opposite to the way humans write informal proofs. Take for example an equational proof, i.e., a proof done by rewriting a term step by step to show an equality. Humans would write down the intermediate expressions obtained during the process. In the procedural style, you write down the rule you apply to move from one expression to the next.

While in some ways, this procedural style of carrying out a proof may in fact be closer to how humans think about proofs, this procedural notation has huge drawbacks when it comes to communicating proofs. To me, the most important drawback is that the meaning of the proof is hidden almost entirely within the library of tactics and theorems that it is built upon. Without constantly referring to a dictionary of all these ALL_CAPS_SYMBOLS it impossible to read or write these proofs at all. This is problematic insofar as it adds an additional layer of complexity to the problem of reading or writing a proof: You not only have to know the subject matter, you also have to bind yourself to a particular “API”. This not only makes proofs less human readable, it also makes procedural proofs stored in this fashion “bit-rot at an alarming rate”.

Of course, the procedural style also has its advantages, most importantly the way it lends itself to automation of proof methods. However, as I stated in the beginning, my primary concern is the translation of everyday informal mathematical articles into a formal language. Which brings us to the declarative proof style.

The declarative version of the proof that I came up with is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
let GF_RECIP_EXPLICIT_FORM = thm `;
  !(f:num->real). ~(f (0:num) = (&0:real)) ==> gf_reciprocal f (recip f)
  proof
    let f be num->real;
    assume ~(f 0 = &0) [1];
    !n. sum (0..n) (\k. f k * recip f (n - k)) = (if n = 0 then &1 else &0)
    proof
      sum (0..0) (\k. f k * recip f (0 - k)) = f 0 * recip f (0 - 0) by NUMSEG_SING,SUM_SING;
        .= f 0 * recip f 0;
        .= f 0 * &1 / f 0 by recip;
        .= f 0 / f 0;
        .= &1 by REAL_DIV_REFL,1;
        .= (if 0 = 0 then &1 else &0) [2];
      !n. sum (0..n) (\k. f k * recip f (n - k)) = (if n = 0 then &1 else &0)
          ==> sum (0..SUC n) (\k. f k * recip f (SUC n - k)) = (if SUC n = 0 then &1 else &0) [3]
      proof
        let n be num;
        assume sum (0..n) (\k. f k * recip f (n - k)) = (if n = 0 then &1 else &0);
        !x. ~(x = &0) ==> !a. x * -- (&1 / x) * a = -- a [4];
        sum (0..SUC n) (\k. f k * recip f (SUC n - k)) = (\k. f k * recip f (SUC n - k)) 0 + sum (0 + 1..SUC n) (\k. f k * recip f (SUC n - k)) by SUM_CLAUSES_LEFT,LE_0;
          .= (f 0 * recip f (SUC n)) + sum (1..SUC n) (\k. f k * recip f (SUC n - k)) by ADD_0,SUB_0,ADD_AC; 
          .= (f 0 * recip f (n + 1)) + sum (1..(n + 1)) (\k. f k * recip f ((n + 1) - k));
          .= (f 0 * -- (&1/(f 0)) * sum (1..(n + 1)) (\k. (f k) * recip f ((n + 1) - k))) + sum (1..(n + 1)) (\k. f k * recip f ((n + 1) - k)) by recip;
          .= -- sum (1..(n + 1)) (\k. (f k) * recip f ((n + 1) - k)) + sum (1..(n + 1)) (\k. f k * recip f ((n + 1) - k)) by 1,4;
          .= &0;
          .= (if SUC n = 0 then &1 else &0);
      qed;
    qed by INDUCT_TAC from 2,3;
    (\n. sum (0..n) (\k. f k * recip f (n - k))) = (\n. (if n = 0 then &1 else &0)) [5] by FUN_EQ_THM;
  qed by gf_reciprocal,gf_times,gf_one,1 from 5;
`;;

To my eye, this looks wonderful! This is an entirely formal proof, that

  • is very close to the informal version,
  • is not that much longer than the informal version,
  • is human readable,
  • uses very few references to an external “dictionary” or “API” (not all steps have to be justified!),
  • and all external references refer to theorems and not to tactics.
On top of that, I had a much, much easier time writing this proof than writing the procedural version.

Of course there are still many aspects that can be improved. First of all, while this ASCII syntax is quite readable (much more readable than LaTeX anyway), the ability to edit a fully typeset version of the proof would be a pleasant convenience. I will definitely look into creating a custom transformer for Qute for this syntax.

More importantly, some steps in this proof seem unnecessarily small. For example, when calculating sum (0..0) (\k. f k * recip f (0 - k)), it would be great to go directly to f 0 * recip f 0 without using f 0 * recip f (0 - 0) as a stepping stone. In a similar vein, I would like to avoid stating trivial lemmas like !x. ~(x = &0) ==> !a. x * -- (&1 / x) * a = -- a explicitly, since in the line where it is used, it is easy to infer from context what lemma is needed here. Finally, I think it is very important to reduce API dependencies even further and so I would like to drop explicit references to such “trivial” library theorems as ADD_0 or ADD_AC.

More details about the declarative proof system and the construction of this proof make up the last part of this post.

On the whole, I am extremely impressed with the state of current formal proof systems. While a lot remains to be done until everyday mathematicians not specialized in formal proof can routinely produce formal versions of their papers, this dream appears to be almost within reach.

After this summary, I am now going to get my hands dirty and explain all of this in detail.

Meeting HOL Light

The proof system I used for these experiments is HOL Light. HOL stands for “higher order logic” which basically means classical mathematical logic with the additional feature that quantifiers can range over functions as well and not only over elements of the ground set. (In practice, working with formal higher order logic is not too different from the mix of “colloquial” classical logic and set theory that I, as a mathematician not specialized in logic, am used to working with.) Higher order logic is one of the standard foundations of mathematics used in the formal proof community and several major proof systems such as Isabelle/HOL, HOL4, ProofPower and HOL Light work with it.

In this post, I will not assume familiarity with HOL Light and I hope to keep things reasonably self-contained. Nonetheless, I want to recommend some references as starting points for further reading: Freek Wiedijk has done some very helpful work comparing and contrasting the different provers out there, see for example the list of the Seventeen Provers of the World which he compiled. To get started with HOL Light, I recommend the great HOL Light Tutorial by John Harrison. When doing a proof, it is useful to have the quick reference at hand.

Installing HOL Light is non-trivial. First, you need to have a working installation of OCaml. Second, you will need Camlp5, which has to be compiled with the -strict configuration option. This is often not the case for camlp5 packages that ship with major Linux distributions. Third, you need to fetch HOL Light from its SVN repository and compile it. I will not walk through these steps in detail, but I can recommend a script by Alexander Krauss that does these things for you (and that you can read if you need instructions on how to do this by hand).

After everything is installed, you run OCaml from the HOL Light source directory and then, from the OCaml toplevel, start HOL Light with the command

1
#use "hol.ml";;
HOL Light will need about a minute to start and will generate lots of output in the process.

In the rest of this section, I will give a very short summary of the basics of HOL Light. As an introduction to HOL Light, this whirlwind tour is of course woefully inadequate. It is mainly intended as a brief orientation for readers unfamiliar with HOL who do not want to read the tutorial first. Either way, I hope that these brief notes will help in making sense of the exercise that this post is about.

  • Proofs in HOL Light are typically developed interactively at the OCaml toplevel.
  • In HOL Light, mathematical expressions in higher order logic are enclosed in backquotes. These expressions have the OCaml type term. For example, if you enter `1 + 1 = 2` at the OCaml toplevel (followed by two semicolons and return) you will see:
    1
    2
    
    # `1 + 1 = 2`;;
    val it : term = `1 + 1 = 2`
  • The HOL Light syntax for higher order logic is straightforward. For example the principle of induction on the natural numbers \[ \forall P: P(0) \wedge \left( \forall n: P(n) \Rightarrow P(S(n)) \right) \Rightarrow (\forall n. P(n)) \] would be written as
    1
    
    !P. P 0 /\ (!n. P n ==> P (SUC n)) ==> (!n. P n)
    where SUC is a predefined constant, $\forall$ is written as ! and $\wedge$ is written as /\. Here is the notation for some other useful symbols: $\exists$ is written as ?, $\vee$ is written as \/, $\neg$ is written as ~ and $\Leftrightarrow$ is written as <=>. Lambda expressions are written using \, e.g., $\lambda x. x + 1$ is written as \x. x + 1.
  • Higher order logic in HOL Light is typed. As long as HOL Light can figure out the types on its own, you don’t need to specify them. If you want to give type information explicitly, you can use the colon. For example, here is the principle of induction on lists:
    1
    
    !P:(A)list->bool. P [] /\ (!h t. P t ==> P (CONS h t)) ==> !l. P l
  • Theorems in HOL Light are objects of type thm. The only way to produce an object of type theorem in HOL Light is to prove that theorem.
  • Inference rules are special OCaml functions that produce theorems. They encapsulate proof methods. For example ARITH_RULE is a function of type term->thm that uses linear arithmetic methods to prove statements about natural numbers. For example we can produce the theorem $\vdash 1 + 1 = 2$ by
    1
    2
    
    # ARITH_RULE `1 + 1 = 2`;;
    val it : thm = |- 1 + 1 = 2
    or we can produce the theorem $x > 0 \vdash x \geq 1$ by
    1
    2
    
    # UNDISCH (ARITH_RULE `x > 0 ==> x >= 1`);;
    val it : thm = x > 0 |- x >= 1
    where UNDISCH takes a theorem of the form $\vdash p \Rightarrow q$ and produces the theorem $p \vdash q$.
  • A goal captures a claim of the form $p_1,\ldots,p_n\vdash q$ that we are currently trying to prove but have not proved yet. In the HOL Light documentation, the distinction is made clear using the ?- notation, e.g., the goal that $x > 0 \vdash x \geq 1$ might be written as x > 0 ?- x >= 1. On the OCaml toplevel (a stack of goals containing) this goal would be printed as:
    1
    2
    3
    4
    5
    
    val it : goalstack = 1 subgoal (1 total)
    
      0 [`x > 0`]
    
    `x >= 1`
    where the expression in square brackets represents the assumption of that goal.
  • A tactic is a function that (essentially) takes a goal and produces a list of subgoals, such that a proof of all subgoals produces a proof of the original goal. This allows backward proofs. These are the typical way of working with HOL Light.
  • An example. Suppose we want to prove $x\not=0 \Rightarrow 1 \leq x$ using the lemma $0 < x \Rightarrow 1 \leq x$. We store the lemma in an OCaml variable.
    1
    2
    
    # let lemma = ARITH_RULE `0 < x ==> 1 <= x`;;
    val lemma : thm = |- 0 < x ==> 1 <= x
    Then we set up our goal with the function g.
    1
    2
    3
    4
    5
    
    # g `~(x=0) ==> 1 <= x`;;                   
    Warning: Free variables in goal: x
    val it : goalstack = 1 subgoal (1 total)
    
    `~(x = 0) ==> 1 <= x`
    Next we apply the tactic DISCH_TAC using the function e. This reduces the goal $\vdash x\not=0 \Rightarrow 1 \leq x$ to $x\not=0 \vdash 0 \Rightarrow 1$.
    1
    2
    3
    4
    5
    6
    
    # e DISCH_TAC;; 
    val it : goalstack = 1 subgoal (1 total)
    
      0 [`~(x = 0)`]
    
    `1 <= x`
    Now we apply our lemma to “simplify” the conclusion of our goal, using MATCH_MP_TAC which is of type thm->tactic. MATCH_MP_TAC uses the theorem $\vdash p \Rightarrow q$ to reduce the goal $\vdash q$ to $\vdash p$, making instantiations as necessary.
    1
    2
    3
    4
    5
    6
    
    # e(MATCH_MP_TAC(lemma));;
    val it : goalstack = 1 subgoal (1 total)
    
      0 [`~(x = 0)`]
    
    `0 < x`
    What is now left to prove is the content of the theorem LT_NZ in the library.
    1
    2
    
    # LT_NZ;;
    val it : thm = |- !n. 0 < n <=> ~(n = 0)
    We make use of this theorem via ASM_REWRITE_TAC of type thm list -> tactic, which rewrites goals by the list of (equational) theorems it is passed, taking assumptions into account.
    1
    2
    
    # e(ASM_REWRITE_TAC[LT_NZ]);;
    val it : goalstack = No subgoals
    The statement that there are no subgoal left, means that the proof is complete. We can get the theorem we just proved using top_thm.
    1
    2
    
    # top_thm();;
    val it : thm = |- ~(x = 0) ==> 1 <= x
    (Note that this example is non-sensical insofar as we could have used ARITH_RULE to solve our goal immediately.)
  • If you make a mistake during one of these goalstack-style proofs, you can use b to back up one step, by calling b();;.

After this minimal introduction to HOL Light, we can now turn to formalizing the above proof.

Formal Definitions

Before we can start with the proof proper, however, we need to make a couple of definitions.

For the purposes of this exercise a formal power series or generating function will be simply a map $f:\mathbb{N}\rar\mathbb{R}$. Of course you could also view $f$ as a sequence of real numbers or, classically, as the sequence of coefficients of a power series. Addition of formal power series is defined componentwise, i.e., $(f+g)(n)=f(n)+g(n)$, which in HOL Light looks as follows:

1
let gf_plus = new_definition `gf_plus = \(f:num->real) (g:num->real) (n:num). (f n) + (g n) :real`;;
This simply means that gf_plus is a function that takes two generating functions and returns the generating function that is the componentwise sum of the arguments.

The product of two generating functions is defined via the Cauchy product rule \[ (f\cdot g)(n) = \sum_{k=0}^n f(k)\cdot g(n-k). \] This looks much more natural when phrased in terms of formal series and this is the reason why formal power series are traditionally written in power series notation: \[ \sum_{n\geq 0}a_nx^n \cdot \sum_{n\geq 0} b_n x^n = \sum_{n\geq0} \left(\sum_{k=0}^n a_k b_{n-k}\right) x^n. \] Personally, I think of this “product” more as a “Minkowski sum”, but that will be the subject of another blog post. The HOL Light version of this definition is straightforward:

1
let gf_times = new_definition `gf_times = \(f:num->real) (g:num->real) (n:num). (sum (0 .. n) (\k. (f k)*(g (n-k)))) :real`;;

Next, we define a constant gf_one which represents the formal power series $1$, i.e., the function $f$ with $f(0)=1$ and $f(n)=0$ for all other $n$.

1
let gf_one = new_definition `gf_one = \(n:num). if n = 0 then &1 else &0`;;

Now, it is straightforward to define a binary relation gf_reciprocal that specifies when a generating function $g$ is the reciprocal of a generating function $f$, that is, $f\cdot g = 1$.

1
let gf_reciprocal = new_definition `gf_reciprocal = \(f:num->real) (g:num->real). gf_times f g = gf_one`;;

Finally, the theorem we will prove contains a recursive formula for the reciprocal of a generating function $f$, namely: \begin{eqnarray*} b_0 & = & \frac{1}{a_0} \cr b_n & = & -\frac{1}{a_0}\sum_{k = 1}^n a_k b_{n-k} \; \forall n\geq 1 \end{eqnarray*} We will use the shorthand recip f to denote the function defined by this recursive formula.

1
let recip = define `recip f (0:num) =  ((&1:real)/(f 0) :real) /\  recip f (n+1) = -- (&1/(f 0)) * sum (1..(n+1)) (\k. (f k) * recip f ((n + 1) - k))`;;

Note that these definitions all return theorems. In addition they have the side effect of modifying the OCaml variable the_definitions, see fusion.ml. This has the drawback that you cannot change existing definitions within a running HOL Light session. You will have to restart HOL Light to change definitions. (If you want to learn more about definitions and state in HOL, you may want to look at these papers.)

A Procedural Proof

The theorem that we want to prove says that if $f(0)\not= 0$ then $\text{recip}(f)$, as defined in the previous section, is a reciprocal of $f$. In HOL this statement reads simply as follows.

1
!(f:num->real). ~(f (0:num) = (&0:real)) ==> gf_reciprocal f (recip f)

Note that 0 refers to the natural number $0$, whereas &0 refers to the real number $0$ - which are distinct in HOL. Real number literals always have to be prefixed with & in HOL. The type annotations given above are not essential. We could also write simply !f. ~(f 0 = &0) ==> gf_reciprocal f (recip f).

In this section, we will go through the procedural proof given in the summary at the beginning step by step, and take a look at the goalstack at each point. If you want to expand other proofs in a similar fashion, you may want to take a look at the tool Tactician.

We start out by pushing the theorem we want to prove onto the goalstack.

1
2
3
4
# g `!(f:num->real). ~(f (0:num) = (&0:real)) ==> gf_reciprocal f (recip f)`;;
val it : goalstack = 1 subgoal (1 total)

`!f. ~(f 0 = &0) ==> gf_reciprocal f (recip f)`

Next, we eliminate the universal quantifier and the implication, turning $f(0) \not = 0$ into an assumption, and we expand the definitions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# e(GEN_TAC);;

val it : goalstack = 1 subgoal (1 total)

`~(f 0 = &0) ==> gf_reciprocal f (recip f)`

# e(DISCH_TAC);;

val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]

`gf_reciprocal f (recip f)`

#  e(REWRITE_TAC[gf_reciprocal;gf_times;gf_one]);;

val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]

`(\n. sum (0..n) (\k. f k * recip f (n - k))) =
 (\n. if n = 0 then &1 else &0)`

We have to show that two functions are equal. This boils down to showing that their values are equal for all arguments. This is the content of FUN_EQ_THM, which we apply using REWRITE_TAC.

1
2
3
4
5
6
7
# e(REWRITE_TAC[FUN_EQ_THM]);;

val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]

`!x. sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`

Now we start working at the heart of the problem. Just as in the informal proof, we are going to use induction, even though a more elementary case analysis would work as well. To use induction we apply INDUCT_TAC which leaves us with two subgoals: The base of the induction and the induction step.

1
2
3
4
5
6
7
8
9
10
11
12
13
# e(INDUCT_TAC);;

val it : goalstack = 2 subgoals (2 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]

`sum (0..SUC x) (\k. f k * recip f (SUC x - k)) =
 (if SUC x = 0 then &1 else &0)`

  0 [`~(f 0 = &0)`]

`sum (0..0) (\k. f k * recip f (0 - k)) = (if 0 = 0 then &1 else &0)`

We start working on the base of the induction, i.e., the second subgoal. All we simplify the equality we have to prove, by evaluating the if .. then .. else .. and using the fact that the sum over an interval that contains just $0$ can be obtained by evaluating the expression we sum over at $0$. This latter fact follows from the lemmas NUMSEG_SING and SUM_SING.

1
2
3
4
5
6
# e(ASM_SIMP_TAC[NUMSEG_SING;SUM_SING]);;
val it : goalstack = 1 subgoal (2 total)

  0 [`~(f 0 = &0)`]

`f 0 * recip f (0 - 0) = &1`

The next couple of steps are about calculating with numbers and we use a couple of different methods for doing so. NUM_REDUCE_CONV is a so-called conversion, that can be used to simplify expressions as part of a bigger expression, and conversions can be turned into tactics using CONV_TAC. NUM_REDUCE_CONV evaluates expressions over natural numbers and this allows us to reduce $0-0$ to $0$, without knowing the name of a lemma in the HOL library that says so. Afterward, we expand recip according to its definition.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# e(CONV_TAC NUM_REDUCE_CONV);;
val it : goalstack = 1 subgoal (2 total)

  0 [`~(f 0 = &0)`]

`f 0 * recip f 0 = &1`

# e(REWRITE_TAC[recip]);;

val it : goalstack = 1 subgoal (2 total)

  0 [`~(f 0 = &0)`]

`f 0 * &1 / f 0 = &1`

Next, we want to go from $f(0)\cdot \frac{1}{f(0)}$ to $\frac{f(0)}{f(0)}$. Again we don’t know the exact name of a lemma in the library that allows us to get there. However, the library does contain a general purpose inference rule REAL_ARTIH that can prove certain identities of real numbers automatically. We use this to prove the lemma we need on the fly. After this, all that is left to show is that $\frac{f(0)}{f(0)}=1$ under the assumption that $f(0)\not=0$. The appropriate lemma in the library is REAL_DIV_REFL. To apply it, we use ASM_SIMP_TAC which uses not only the theorem it is passed as argument, but also takes into account the available assumptions. This, then, completes our proof of the base of the induction, and we continue to the next goal on the stack: the induction step.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# e(SIMP_TAC[REAL_ARITH `x * &1 / x = x / x`]);;
val it : goalstack = 1 subgoal (2 total)

  0 [`~(f 0 = &0)`]

`f 0 / f 0 = &1`

# e(ASM_SIMP_TAC[REAL_DIV_REFL]);;
val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]

`sum (0..SUC x) (\k. f k * recip f (SUC x - k)) =
 (if SUC x = 0 then &1 else &0)`

The first thing that we want to do here is to split off the first summand of the sum. The theorem in the library that says we can split off the first term in a sum is SUM_CLAUSES_LEFT.

1
2
# SUM_CLAUSES_LEFT;;
val it : thm = |- !f m n. m <= n ==> sum (m..n) f = f m + sum (m + 1..n) f

We want to use this theorem for $f=\lambda k. f(k)\cdot \text{recip}(f,x-k)$, $m=0$ and $n=n$. We can achieve these specializations as follows. Here LE_0 is the theorem $\forall n. 0 \leq n$ and SPEC_ALL LE_0 is simply the theorem $0 \leq n$. Note that $n$ refers to a natural number here.

1
2
3
4
5
6
7
8
9
10
11
12
# let sum_lemma = (SPEC `(\k. f k * recip f (x - k))` SUM_CLAUSES_LEFT);;
val sum_lemma : thm =
  |- !m n.
         m <= n
         ==> sum (m..n) (\k. f k * recip f (x - k)) =
             (\k. f k * recip f (x - k)) m +
             sum (m + 1..n) (\k. f k * recip f (x - k))
# let split_sum_lemma = MATCH_MP sum_lemma (SPEC_ALL LE_0);;
val split_sum_lemma : thm =
  |- sum (0..n) (\k. f k * recip f (x - k)) =
     (\k. f k * recip f (x - k)) 0 +
     sum (0 + 1..n) (\k. f k * recip f (x - k))

We use our split_sum_lemma to split the sum in question. After some simplifications, the first occurrence of $\text{recip}$ in the expression is in the exact form we need in order to be able to apply the recursive definition of $\text{recip}$.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# e(REWRITE_TAC[split_sum_lemma]);;
val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]

`f 0 * recip f (SUC x - 0) +
 sum (0 + 1..SUC x) (\k. f k * recip f (SUC x - k)) =
 (if SUC x = 0 then &1 else &0)`

# e(REWRITE_TAC[SUB_0;ADD_0;ADD_AC;ADD1]);;
val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]

`f 0 * recip f (x + 1) + sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) =
 (if x + 1 = 0 then &1 else &0)`

# e(REWRITE_TAC[recip]);;

val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]

`f 0 * --(&1 / f 0) * sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) +
 sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) =
 (if x + 1 = 0 then &1 else &0)`

Now we want to cancel the two occurrences of $f(0)$ in the numerator and the denominator. Standard facilities for working with expressions of real numbers, such as REAL_RAT_REDUCE_CONV, do not suffice here, as we have to take the assumption that $f(0)\not=0$ into account. However, the lemma that we need in this situation can be proved automatically by REAL_FIELD.

1
2
3
4
5
#  let calc_lemma = (REAL_FIELD `!x. ~(x = &0) ==> !a. x * -- (&1 / x) * a = -- a`);;
2 basis elements and 0 critical pairs
3 basis elements and 1 critical pairs
3 basis elements and 0 critical pairs
val calc_lemma : thm = |- !x. ~(x = &0) ==> (!a. x * --(&1 / x) * a = --a)

We need to specialize this lemma to make use of it. The expression FIRST_ASSUM (ASSUME_TAC o (MATCH_MP calc_lemma)) is idiomatic for this purpose. FIRST_ASSUM tac tries to apply the tactic tac to each assumption in turn, until it finds one on which tac succeeds. In our case ASSUME_TAC o (MATCH_MP calc_lemma) succeeds immediately on the first assumption. MATCH_MP calc_lemma applied to the assumption ~(f 0 = &0) makes the substitution $x=f(0)$ in calc_lemma and produces the theorem !a. f 0 * --(&1 / f 0) * a = --a. This is then added as an assumption to our goal by ASSUME_TAC. Afterward, ASM_REWRITE_TAC is able to perform the simplification we intended.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# e(FIRST_ASSUM (ASSUME_TAC o (MATCH_MP calc_lemma)));;
val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]
  2 [`!a. f 0 * --(&1 / f 0) * a = --a`]

`f 0 * --(&1 / f 0) * sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) +
 sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) =
 (if x + 1 = 0 then &1 else &0)`

# e(ASM_REWRITE_TAC[]);;
val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]
  2 [`!a. f 0 * --(&1 / f 0) * a = --a`]

`--sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) +
 sum (1..x + 1) (\k. f k * recip f ((x + 1) - k)) =
 (if x + 1 = 0 then &1 else &0)`

For the last simplification we again create the necessary theorem on the fly, without looking it up in the library.

1
2
3
4
5
6
7
8
9
# e(SIMP_TAC[REAL_FIELD `--a + a = &0`]);;
1 basis elements and 0 critical pairs
val it : goalstack = 1 subgoal (1 total)

  0 [`~(f 0 = &0)`]
  1 [`sum (0..x) (\k. f k * recip f (x - k)) = (if x = 0 then &1 else &0)`]
  2 [`!a. f 0 * --(&1 / f 0) * a = --a`]

`&0 = (if x + 1 = 0 then &1 else &0)`

The last thing we need to do is to show that the right-hand side of this equality is $0$, using the fact that $x+1\not=0$ for any natural number $x$. The proof is then complete and we can retrieve the theorem we just proved using top_thm.

1
2
3
4
5
# e(ASM_SIMP_TAC[ARITH_RULE `~(x + 1 = 0)`]);;
val it : goalstack = No subgoals

# top_thm();;
val it : thm = |- !f. ~(f 0 = &0) ==> gf_reciprocal f (recip f)

As we can see, running the procedural proof interactively is crucial to understanding what the proof does. However, working through this exercise, I have often found it excruciatingly hard to perform even simple manipulations of the goalstack in the way I intended. My understanding of HOL’s library of theorems and tactics has to improve considerably, before I can become effective at this. Fortunately, working with declarative proofs was much easier for me.

Mizar in HOL: miz3

The Mizar system pioneered declarative proof languages, and has a lasting influence on declarative proof languages in other systems, such as the Isar language that is part of Isabelle or the various variants of Mizar implementations for the HOL family of provers. The latest incarnation of Mizar for HOL Light is miz3 by Freek Wiedijk.

As you can see from the declarative version of the proof given in the introduction, Mizar-style proofs are largely self-explanatory. Of course some elements still require explanation, but I will not go through them in detail here. Instead I want to refer you directly to Freek’s paper. Instead, I will use this section to give some tips for installing and using miz3. These tips refer to the version of miz3 available at the time of publication of this post (miz3 releases do not appear to be numbered).

  • You can get miz3 from here. To use miz3, all you need to do is to extract miz3.ml from the archive and place it in your HOL Light source directory.
  • miz3 comes with a Vim interface. This interface is very lightweight and its use is not required: You can always paste miz3 code into your OCaml toplevel. But these notes are mainly about using miz3 via Vim. To install the Vim interface, you need to
    • place the two executables miz3 and miz3f from the bin/ subdirectory of the miz3 distribution on your path, and
    • add source /path/to/exrc to your .vimrc, where the path points to the file exrc from the miz3 distribution.
  • To use miz3, you need to start the OCaml toplevel using the command ocaml unix.cma. On the toplevel, first run #use "hol.ml";; and then #use "miz3.ml";;. This automatically starts the miz3 server and you can begin using Vim.
  • In Vim, you can press Ctrl-C RET to send paragraphs of code (delimited by blank lines) to the OCaml toplevel. Error reports are then displayed right in the Vim buffer. However, some errors are not. So you should always keep an eye on your toplevel!
  • You should keep the paragraphs you send to miz3 minimal: Separate your miz3 proofs and your (* OCaml comments *) with a blank line. Do the same for definitions and proofs. Otherwise miz3 may throw errors that only appear on the toplevel.
  • GOAL_TAC is another very useful tool to know about in a declarative proof. If you want to know, what exact goal you have to prove at any given step of you declarative proof, you can add by GOAL_TAC to the end of the line. After the declarative proof has been processed by miz3 you can inspect the goal at that point by running p();; at the toplevel. This works no matter if you proof has been successful or not. As GOAL_TAC is a tactic, it is very flexible and can be used in procedural proofs or as part of a more complicated tactic as well.

After these introductory comments, we can now go on to the declarative version of the proof.

A Declarative Proof

Using a vanilla installation of HOL Light and miz3, a declarative version of the proof might look like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
let GF_RECIP_EXPLICIT_FORM = thm `;
  !(f:num->real). ~(f (0:num) = (&0:real)) ==> gf_reciprocal f (recip f)
  proof
    let f be num->real;
    assume ~(f 0 = &0) [1];
    !n. sum (0..n) (\k. f k * recip f (n - k)) = (if n = 0 then &1 else &0)
    proof
      sum (0..0) (\k. f k * recip f (0 - k)) = f 0 * recip f (0 - 0) by NUMSEG_SING,SUM_SING;
        .= f 0 * recip f 0;
        .= f 0 * &1 / f 0 by recip;
        .= f 0 / f 0;
        .= &1 by REAL_DIV_REFL,1;
        .= (if 0 = 0 then &1 else &0) [2];
      !n. sum (0..n) (\k. f k * recip f (n - k)) = (if n = 0 then &1 else &0)
          ==> sum (0..SUC n) (\k. f k * recip f (SUC n - k)) = (if SUC n = 0 then &1 else &0) [3]
      proof
        let n be num;
        assume sum (0..n) (\k. f k * recip f (n - k)) = (if n = 0 then &1 else &0);
        !x. ~(x = &0) ==> !a. x * -- (&1 / x) * a = -- a [4] by REAL_FIELD;
        sum (0..SUC n) (\k. f k * recip f (SUC n - k)) = (\k. f k * recip f (SUC n - k)) 0 + sum (0 + 1..SUC n) (\k. f k * recip f (SUC n - k)) by SUM_CLAUSES_LEFT,LE_0;
          .= (f 0 * recip f (SUC n)) + sum (1..SUC n) (\k. f k * recip f (SUC n - k)) by ADD_0,SUB_0,ADD_AC; 
          .= (f 0 * recip f (n + 1)) + sum (1..(n + 1)) (\k. f k * recip f ((n + 1) - k));
          .= (f 0 * -- (&1/(f 0)) * sum (1..(n + 1)) (\k. (f k) * recip f ((n + 1) - k))) + sum (1..(n + 1)) (\k. f k * recip f ((n + 1) - k)) by recip;
          .= -- sum (1..(n + 1)) (\k. (f k) * recip f ((n + 1) - k)) + sum (1..(n + 1)) (\k. f k * recip f ((n + 1) - k)) by 1,4;
          .= &0;
          .= (if SUC n = 0 then &1 else &0);
      qed;
    qed by INDUCT_TAC from 2,3;
    (\n. sum (0..n) (\k. f k * recip f (n - k))) = (\n. (if n = 0 then &1 else &0)) [5] by REWRITE_TAC,FUN_EQ_THM;
  qed by REWRITE_TAC,gf_reciprocal,gf_times,gf_one,1 from 5;
`;;

Note that this version differs slightly from the declarative proof given at the beginning. The difference is that this version uses additional references to the HOL library, making explicit use of the tactics REAL_FIELD and REWRITE_TAC. The proof given in the introduction is a bit less verbose and it is tactic-free, i.e., it only references theorems and not tactics in the library.

Why do I care about a proof being tactic-free? After all, the availability of proof tactics and the corresponding opportunities for automation are a great strength of procedural provers such as HOL Light? Well, on the one hand this exercise is about translating human-written proofs and not about automated reasoning. On the other hand, theorems used in by clauses can be stated in higher order logic and can thus be easily included in a prover-independent library. Tactics, however, depend critically on the prover they were written for and are therefore less portable. I will write another post in the near future on why I think prover independence is so important.

Making Declarative Proofs more Self-Contained

In this section we will see how to tweak miz3 in order to eliminate the tactics REWRITE_TAC and REAL_FIELD from the above declarative proof.

First, we have to examine a bit more closely how miz3 proves each step in a declarative proof:

  • If the by clause contains only theorems and no tactics (or if it is even empty altogether), then miz3 uses the tactic specified in the variable default_prover to prove this step.
  • If the by clause contains a tactic, that tactic is used for the proof and the default prover is ignored.

The standard default prover in miz3 is HOL_BY, which in turn was inspired by the prover contained in the original Mizar. HOL_BY resides in the file Examples/holby.ml in the HOL Light source directory (not the miz3 source directory). It is set as the default prover of miz3 at the very top of miz3.ml.

1
let default_prover = ref ("HOL_BY", CONV_TAC o HOL_BY);;

HOL_BY is of type thm list -> term -> thm whence CONV_TAC o HOL_BY is of type thm list -> tactic, which is the type expected of a default prover.

We now turn to the task of eliminating the explicit reference to REAL_FIELD from the proof. The idea is to improve HOL_BY so that the default prover tries REAL_FIELD automatically. REAL_FIELD is an inference rule. So in order to find a suitable place for integrating REAL_FIELD into HOL_BY, we look for other inference rules in the HOL_BY source. Right at the beginning of the definition of the function HOL_BY itself, we find a bunch of conversions and inference rules. So we just include REAL_FIELD right after REAL_ARITH, like this:

1
2
3
4
5
6
7
8
9
10
11
12
let HOL_BY =
  let BETASET_CONV =
    TOP_DEPTH_CONV GEN_BETA_CONV THENC REWRITE_CONV[IN_ELIM_THM]
  and BUILTIN_CONV tm =
    try EQT_ELIM(NUM_REDUCE_CONV tm) with Failure _ ->
    try EQT_ELIM(REAL_RAT_REDUCE_CONV tm) with Failure _ ->
    try ARITH_RULE tm with Failure _ ->
    try REAL_ARITH tm with Failure _ ->
    try REAL_FIELD tm with Failure _ ->
    failwith "BUILTIN_CONV" in
  fun ths tm ->
    ... etc. ...

And indeed, with this addition, we can remove REAL_FIELD from our declarative proof (after reloading miz3.ml).

Eliminating REWRITE_TAC from our proof turns out to be a bit more subtle. If we remove the explicit occurrences of REWRITE_TAC from the proof, we get an inference timeout. Does this imply that REWRITE_TAC can do something that HOL_BY cannot? No. Even if we replace REWRITE_TAC with ALL_TAC, the proof will succeed, even though ALL_TAC is the tactic that does nothing. The answer to this riddle lies in the code that miz3 wraps around the tactics it is given. This code we find in the function tactic_of_by in miz3.ml.

In tactic_of_by we find the block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
   (try
      0,((FILTER_ASSUMS (fun _,(x,_) -> x <> "=") THEN
        FILTER_ASSUMS
         (fun n,(x,_) ->
            mem x labs or n < hor or (n = 0 & mem "-" labs) or full_asl) THEN
        MAP_ASSUMS (fun l,th -> l,PURE_REWRITE_RULE sets th) THEN
        MIZAR_NEXT' (PURE_REWRITE_TAC sets) THEN
       (fun (asl',w' as g') ->
          let key = name,(map concl thms,map concl thms'),w' in
          try
            if grow then failwith "apply";
            let e,th = apply (!just_cache) key in
            if e = 0 then (ACCEPT_TAC th THEN NO_TAC) g' else
            if e = 2 then raise Timeout else failwith "cached by"
          with
          | Failure "apply" ->
              try
                let (_,_,just as gs) =
                 ((fun g'' ->
                    let gs' = TIMED_TAC (!timeout) (tac thms) g'' in
                    if grow then raise (Grown (steps_of_goals g gs'))
                            else gs') THEN
                  REPEAT (fun (asl'',_ as g'') ->
                    if subset asl'' asl' then NO_TAC g''
                    else FIRST_ASSUM (UNDISCH_TAC o concl) g'') THEN
                  TRY (FIRST (map ACCEPT_TAC thms'')) THEN
                  REWRITE_TAC thms'' THEN NO_TAC) g' in
                let th = just null_inst [] in
                just_cache := (key |-> (0,th)) !just_cache;
                gs
              with
              | Grown _ as x -> raise x
              | x -> if name <> "GOAL_TAC" then just_cache :=
                    (key |-> ((if x = Timeout then 2 else 1),TRUTH))
                      !just_cache;
                  raise x
        )) g)

And the crucial piece in here is the line:

1
let gs' = TIMED_TAC (!timeout) (tac thms) g'' in

Here tac refers to either the tactic that was found in the by clause or the default prover. TIMED_TAC tries the tactic it is passed for a fixed amount of time. If the tactic does not succeed in the specified interval, a Timeout exception is raised and the tactics that follow are never even tried. As a consequence a long-running tactic such as HOL_BY may achieve less than a instantaneous tactic such as ALL_TAC, because the exception thrown by the former prevents the last few tactics in tactic_of_by from being called. A workaround, therefore, is to modify the default prover so that HOL_BY is tried for a fixed amount of time strictly less than the timeout. After that HOL_BY is aborted without raising an exception and the subsequent tactics given in tactic_of_by are still executed.

To implement this workaround, we add the following definition to then end of miz3.ml right before the definition of reset_miz3:

1
let TIMEOUT_HOL_BY = fun ths g -> try TIMED_TAC 1 (CONV_TAC (HOL_BY ths)) g with Timeout -> ALL_TAC g;;

This defines a new tactic TIMEOUT_HOL_BY that tries to run HOL_BY and aborts it silently after 1 second. Next, we set the timeout after which tactics are killed and an exception is raised to 10 seconds (so that this timeout does not interfere with TIMEOUT_HOL_BY) and set the default prover accordingly to TIMEOUT_HOL_BY, by modifying reset_miz3 as follows:

1
2
3
4
5
6
7
8
9
let reset_miz3 h =
  horizon := h;
  timeout := 10;
  default_prover := ("HOL_BY", TIMEOUT_HOL_BY);
  sketch_mode := false;
  just_cache := undefined;
  by_item_cache := undefined;
  current_goalstack := [];
  server_up();;

After this, we reload miz3.ml and then run reset_miz3 !horizon;; on the toplevel. Our miz3 can now tackle the tactic-free declarative proof given at the beginning of this post!

As I already wrote at the end of the summary, I am very impressed. With current provers it is possible to write proofs that are at the same time formal, human-readable and reasonably short and self-contained. On top of that, further improvements seem to be within reach.

For me, the biggest pleasant surprise was that a declarative proof in miz3 requires relatively few explicit justifications (by clauses) in each step. I think this is very important, as this reduces the volume of the dictionary that a user has to learn to get started, it decreases the danger of bit-rot and it increases portability. I will write more on these issues in a future blog post.

Comments