Meet the Robinson

An introduction to automatic theorem proving.

Sunday, October 18, 2015 · 8 min read

On October 10, 1996, a program called EQP solved a problem that had bothered mathematicians since 1933, almost 60 years ago. Using a powerful algorithm, EQP constructed a proof of a theorem on its own. EQP ran for around 8 days and used about 30 Mb of memory.

The theorem was known as Robbins’ conjecture; a simple question about whether a set of equations is equivalent to the boolean algebra.

You read more background here or on the New York Times archives. You can also view the computer-generated proof here.

This article (or series of articles) will shed some light on how EQP worked: the fascinating world of automatic theorem proving.

Along the way, we’ll explore ideas that have spanned millennia: formal logic, Gödel’s (In)completeness theorem, parsing, Chapter 4.4 of The Structure and Interpretation of Computer Programs, P=NP?, type theory, middle school geometry, Prolog, and philosophy. Over time, we’ll build up the tools needed to express a powerful theorem-proving algorithm: Robinson’s Resolution.

This isn’t meant to be very notation-heavy. I’ve avoided using dense notation in favor of sentences, hoping it will be easier for people to jump right in.

Exercises are provided for you to think about interesting questions. Most of them can be answered without a pencil or paper, just some thought. Some of them are introductions to big ideas you can go explore on your own. So really, don’t think of the exercises as exercises, but as the Socratic part of the Socratic dialogue.

Exercises will be formatted the way this sentence is.

If you want a formal education in all this, consult Russell and Norvig’s textbook, Artificial Intelligence: A Modern Approach. All I’m writing here I learned on my adventures writing my own theorem prover, lovingly named Eddie.

To prove theorems, we really need to nail down what words like “prove” and “theorems” mean. This turns out to be kind of tricky, so the first post will be devoted to building up some formal logic: mainly vocabulary and big ideas.

Logic begins with propositions. Propositions are sentences that can be true or false, like “Charlie is a basketball player.” or “Two plus two is five.”.

Except, we don’t necessarily know whether a proposition is true or false. Do you know whether Charlie is a basketball player? I don’t.

Then again, there are certain propositions that everyone knows are true. Propositions like “Either it is raining or it is not raining.”.

How do we know that that’s always true? One way is to list out all possibilities and see what happens:

It is raining.	It is not raining.	Either it is raining or it is not raining.
True	True	True
False	True	True
True	False	True
False	False	True

This sort of listing is called a truth table, and it assigns a truth value for each proposition. Each row is called a model.

Statements like these are called tautologies, and they seem to be pretty meaningless (like “no news is no news” or “if you don’t get it, you don’t get it”). We’re also going to refer to them as valid statements.

A statement could also be never true. These are called invalid statements.

Finally, a statement could possibly be true: these are called satisfiable. A satisfiable sentence is a sentence that could be true if certain propositions are true and others are false. Here’s an example:

It is raining.	Mark is strong.	It is raining and Mark is strong.
True	True	True
True	False	False
False	True	False
False	False	False

As we get into more complicated stuff, it’ll be annoying to write out these propositions in full. Also, sometimes we will want to talk about an arbitrary proposition, like a variable. So, we’re going to use capital letters to denote propositions. For example, I might say “let P be the proposition ‘It is raining.’”. You can then replace all instances of P with that proposition.

If a statement is not invalid then it is (choose one: valid/satisfiable).
Relate the above statement to why the opposite of “everyone came to the party” isn’t “nobody came to the party”. State the correct opposite.
Relate the above to De Morgan’s Laws, if you’ve heard of those.

So I cheated a bit above; I made up sentences like “It is raining and Mark is strong.” without talking about what “and” really means.

Logicians generally use three main ‘operators’: “and”, “or”, “not”. You can use them to modify or join propositions and form bigger propositions (called sentences or formulas or expressions).

Explain why having both “and” and “or” is redundant. Explain why we have them anyway.

We have symbols to denote these: we use ¬A for “not A”, A∧B for “A and B”, and A∨B for “A or B”. It’s easy to get those last two confused at first; a nice mnemonic is that ∨ looks like a branching tree, which relates to choice (“or”).

In practice, this lets you turn sentences like “Either I am dreaming, or the world is ending and we do not have much time left.” into something like:

D ∨ (W ∧ ¬ T)

Explain how ∨ relates to ∪ and ∧ relates to ∩ in set theory. Use a Venn diagram.
For some fun trivia, look up the difference between a Venn diagram and and Euler diagram. Which mathematician came first? I’m going to refer to Euler diagrams instead of Venn diagrams in the rest of this post.

Another operator is implication. We say “A implies B” if B is true whenever A is true. We denote this in symbols with “A ⇒ B”.

Draw an Euler diagram for A ⇒ B.
Rewrite A ⇒ B in terms of “or” and “not”. Use your Euler diagram to help you figure this one out.

That last exercise was a little tricky. How did you know your answer was correct? The foolproof way is to write out a truth table for A and B. But, as you can imagine, that gets tedious as you add more and more propositions.

Prove De Morgan’s Laws by showing that the truth tables of both expressions are identical.
How many rows are in a truth table of an expression with 10 propositions?

And it gets worse. What if you have an infinite number of propositions? Like “1 is a positive number” and “2 is a positive number” and so on ad infinitum? Infinitely long truth tables sound gnarly. Clearly, we need a better way to deal with this.

The better way is to think in terms of rules of inference. Rules of inference are ways to transform expressions.

A rule of inference you’ve probably used is modus ponens, which states that if you have “P is true, and P implies Q” then you can deduce that “Q is true”.

For example, if Rover is a dog, then Rover is an animal. Since Rover is a dog, we can deduce that Rover is an animal.

Rewrite “P is true, and P implies Q” in terms of the operators introduced above.
Create a truth table for that where the columns are “P”, “Q”, and “P is true and P implies Q”.
Circle all the rows where the last column is “True”, and check to see that “Q” is true in those rows.
Prove, with a truth table, that ((P⇒Q)∧P)⇒Q is a tautology.

Rules of inference are often written as fractions where the preconditions (also called antecedents or premises) are written as numerators and the results (also called consequents or conclusion) are written as denominators:

\[ \frac{P,\; P\implies Q}{Q} \]

Note that sometimes we elide the “and” from a series of premises.

Write a pair of rules of inference that correspond to De Morgan’s Laws.
Why are rules of inference written as fractions? Come up with a “cancellation” rule that allows you to chain rules of inference where one’s conclusion is the other’s premise.
Are rules of inference reversible? Write a reversible and irreversible rule of inference.
In the rule of inference written above, “P” and “Q” don’t represent specific propositions. What do they represent? Write an English sentence that expresses what they represent. Explain why they are called metavariables. Explain how this relates to pattern-matching.
Explain how this allow us to use a finite truth table to say things about a potentially infinite set of propositions.

A rule of inference is valid if its conclusions are true in all models where its premises are true.

Write an invalid rule of inference.
Write an invalid rule of inference that looks valid at first glance by modifying modus ponens. Look up modus tollens, affirming the consequent, and denying the antecedent. For extra credit, use one of these fallacies in your homework and see if your teacher notices.

A rule of inference should also be consistent, which means you shouldn’t ever be able to use that rule of inference to prove both A and not-A.

Write an inconsistent rule of inference.

Now, finally, we have the tools to talk about proofs. In a “logical system”, you pick certain axioms, which are propositions that seem true. Then you use your rules of inference to show that those axioms imply other exciting things. Theorems are propositions that you can prove in this way, and proofs are chains of these rules of inference. A statement that has a proof is a theorem.

Explain how you can represent parsing as a logical system. In particular, come up with an explicit connection between (1) production rules and rules of inference, and (2) theorems and strings that match a grammar.
We talk about parse trees, not parse chains. Should we talk about proofs in terms of trees of rules of inference, rather than chains of rules of inference?

One of the first logical systems was Euclid’s postulates. With a handful of simple axioms that anyone would agree with, Euclid built up all of geometry. While other philosophers (like Thales) had come up with the same results Euclid had centuries before him, Euclid put all that machinery on a solid, rigorous foundation. In a future episode, we might even go ahead and encode Euclid’s axioms in formal logic.

Meanwhile, in the early 20th century, a crisis was brewing. People were coming up with all sorts of messy paradoxical results. The biggest one was Russell’s Paradox, which went somewhat like this:

Imagine the set of all sets that don’t contain themselves. So, for example, the set {1} is in that set, while the set of all sets is not in that set (because it contains itself). Does this weird set contain itself?

Wrap your head around that by coming up with an explanation for why it does and doesn’t contain itself.
Explain how this relates to the Liar’s Paradox, “This statement is false.”

With these messy loopholes popping up all over the place, Alfred North Whitehead and Bertrand Russell decided it was a good idea to take matters into their own hands and put math on solid foundations like Euclid did centuries before.

The result was the Principia Mathematica*, a treatise that stated a set of axioms and rules of inference for set theory, and then built up arithmetic and the rest of math from that.

A proof from the Principia.

The Principia was careful to avoid any sets that contained themselves, and so Russell’s Paradox and the Liar’s Paradox could be avoided.

How does the Liar’s Paradox relate to sets that contain themselves?

It seemed like a sweet deal, until Kurt Gödel came along and broke everything.

(*We stopped using the Principia, by the way. Most mathematicians use the axioms of Zermelo–Fraenkel set theory, abbreviated ZF. But it still has the problems Gödel discovered.)

Modus ponens isn’t a magic bullet.

Given “A and B”, can you use just modus ponens to prove “A or B”? Does that mean “A or B” is false?
Come up with another statement that cannot be proved under modus ponens.

This is really bad, if you think about it. It means that there are statements that can be true according to our axioms, but impossible to prove in our logical system. This is what is referred to as incompleteness—the subject of Gödel’s Incompleteness Theorem.

The Incompleteness Theorem says that if your logical system can be used to represent arithmetic, then you can come up with a statement that you can neither prove nor disprove.

This is horrible. It means that there are statements we can’t prove in math. There are questions without answers!

Here’s a question without an answer: those of you who have read Harry Potter and the Diagon(alization) Alley might remember that there are more real numbers than integers. The Continuum Hypothesis states that there isn’t a set of numbers that is “bigger” than the integers but “smaller” than the real numbers. It turns out that we can’t prove that. The Continuum Hypothesis is independent of the ZF axioms.

If that sounds a bit abstract, here’s another one: Euclid’s parallel postulate, which says something really obvious about parallel lines, turns out to be independent of his other axioms.

If a line segment intersects two straight lines forming two interior angles on the same side that sum to less than two right angles, then the two lines, if extended indefinitely, meet on that side on which the angles sum to less than two right angles.

Finally, the axiom of choice is pretty controversial. Most mathematicians accept it as true, even though it leads to all sorts of weird results. The weirdest one is the Banach-Tarski Paradox, which shows how you can take a sphere, cut it up into five pieces, and reassemble them to get two spheres.

When we talk about ZF with the axiom of choice, we call it ZFC.

Relate independent statements to the notion of undecidable problems in computer science, such as the halting problem or computing Kolmogorov complexity. Do the proofs of these require some sort of self-referential argument?

As exciting as the Incompleteness Theorems are, there’s a much less celebrated result of Gödel’s called, oddly enough, the Completeness Theorem, published in 1929. It says that there is a rule of inference that is complete (every true statement is provable) for propositional logic.

Gödel did not, however, show what this rule was: we had to wait until 1965 for it. In the next post, we will discover this magical rule of inference, prove its completeness, and show how to use it to write an automatic theorem prover. And we’re going to find out why the title of this post makes sense.

This post had very few external links. Find a topic you didn’t understand or something you want to learn more about, and look it up.
Pick 3 things you liked about this post, and 3 things that could be improved. Tell me these things.

◊ ◊ ◊