Proof checking Euclid

Tarski Formalization Project: Testing The Master List

This page explains the "correctness issues" about the master list, and how the program TestMasterList addresses those issues. It contains a high-level description of the program TestMasterList. You can find the code itself from our Tools page. The code relies on MasterList.php, which contains the master list itself, in the form of an array $TarskiTheorems, as well as arrays $TarskiDefinitions and $TarskiAxioms.

Here are the correctness issues for the entire project:

Is the positive form of each theorem in the master list really the correct Skolemization of the negative form?
Does the diagram entry in the master list have the form of an equation with a new constant on the left? And on the right, does it mention only previously-introduced Skolem functions?
Is each Skolem function introduced in exactly one theorem?
Does the master list actually correspond to the statements of the theorems in the book?
Are the input files being correctly generated?
Do they all produce proofs?
And of course: are the proofs produced by OTTER correct? (This is not a novel issue and will not be discussed here.)

Of these issues, the first four concern the master list, and are discussed on this page.

Each theorem has both a NegatedForm and a PositiveForm, and in case the theorem requires one or more new Skolem symbols, those are listed as an array of strings in the SkolemSymbolsfield. The first task of TestMasterListis to check that the PositiveForm is indeed a correct Skolemization of the NegatedForm. Both entries were originally computed by hand.

When we computed (by hand) the positive form, we followed rigid rules that were easy to program. Constants such as $a$ were changed to variables such as $xa$. In some cases we had constants $cu$ or $cx$, where the book proof had $u$ or $x$; in computing the positive form these become $u$ or $x$ again, rather than $xcx$ or $xcu$. Thus, TestMasterList carries out Skolemization on a text-only basis, without parsing terms or clauses. (Any parse errors will turn up when we run the resulting input files.) The program copes correctly when the negative form has one clause that is a disjunction. In the entire Tarski corpus, there is only one theorem whose negative form contains two disjunctions. That one was Skolemized by hand and checked carefully. That the resulting input files do prove is another positive indication.

An issue in Skolemization was the order of arguments to Skolem functions. We carried out Skolemization in two stages: first, we inserted the correct Skolem function symbol with no arguments. For example, reflect(). If the theorem introduces more than one Skolem symbol, then TestMasterList has to contain the knowledge as to which variable in the NegatedForm is to be replaced by which Skolem symbol. This is usually just x,y in that order, but there are some exceptions. The second stage is to supply the correct arguments to each Skolem function. In 2015 we could not change the conventions that we adopted in 2013, as that would have rendered the hints obtained from our existing proofs useless. That necessitated some not very beautiful code that specifies the order per Skolem function. The correctness of this code is evidenced by the fact that it did work with hints obtained from existing proofs. Luckily, in 2013 when we worked on one file at a time, we were consistent about the order of arguments of the same Skolem function in different files.

Each theorem that introduces a Skolem function must introduce a new Skolem symbol, i.e. on that is not introduced by any theorem occurring earlier in the $TarskiTheorems array, and in fact is not even mentioned in any previous theorem or the diagram of any previous theorem. This condition is checked by TestMasterList.

Another concern about the correctness of the master list is the correctness of the diagram. Since the diagram clauses are included in list(sos), they need to be of a syntactic form that is known to be conservative, so that we do not introduce any bogus proofs via the diagram. Adding equations defining a new constant is conservative, i.e., no new theorems can be proved by adding such axioms. However, the meaning of "new constant" is not quite as obvious as one might think. Consider, for example, the diagram for Satz 3.17. There are two equations: $$ e = ip(c,b1,a1,a,p)$$ $$ d = ip(c,b,a,b1,e)$$ The new constants are $d$ and $e$. The equation for $d$ has $e$ on the right, but that is all right. What would be wrong is if the equation for $d$ had $d$ on the right, or if the equation for $e$ had $d$ on the right and the equation for $d$ had $e$ on the right. What we mean by "new constant" is, "constant not occurring in the theorem or the right sides of the previous diagram equations."

The syntactic requirement checked by TestMasterList is that each diagram clause has the form of an equation with a new constant on the left, in the precise sense defined above. TestMasterList checks this condition as well. When we first ran this check, it exposed exactly one violation, in which a diagram contained an illegal equation. (That was a harmless case, but still it showed that our program was doing its job.)

The question, whether the diagram in the master list corresponds to the book's diagram, is not relevant. If you get a proof with any diagram whatsoever (meeting the above conditions) it is fine.

Sometimes OTTER can find a proof if $a=b$, and also if $a \neq b$, but cannot find a proof with neither assumption. Then it sometimes help to add $a=b \lor a \neq b$. After that one can usually remove the tautology, but sometimes not. Therefore, in the master list there is a field in the Theorem data structure called Cases. This is used for tautologies. We made TestMasterList check that the strings in the Cases field really are tautologies. That was necessary since whatever is in the Cases field is added to the hypotheses of the theorem when input files are generated. A typo in a tautology could cause a bogus proof.

We have observed other errors in the master list. For example, the members of various arrays are separated by commas at the end of lines. If one of those commas is accidentally a period intead of a comma, that is not a PHP error, because period concatenates strings. Then an incorrect string is written into the input file. The result might be an OTTER syntax error, but at least once, it was not. Which correctness condition was violated? It was the condition that the form of the theorem in the master list should match the form in the book. Suppose the error occurred in Theorem 5.1. It might be caught when Theorem 5.1 did not prove. But it could also happen that the bogus input file does produce a proof (but not of the real Theorem 5.1, only of the bogus Theorem 5.1). Then we would hope that the error would be caught when Theorem 5.2 or Theorem 5.3 does not prove. (That was how we found this error when it really happened.) But in general, there is no mechanical way to check that each theorem in the master list correctly expresses what Szmielew meant in writing the theorem in the book.

However, if under those circumstances we did still find a proof of Theorem 5.3, the proof would be correct, because the "bogus" theorem 5.1 was still correctly proved and correctly used to prove Theorem 5.3.

Michael Beeson's Research

Tarski Formalization Project: Testing The Master List