Meet the Robinson

An introduction to automatic theorem proving.

Sunday, October 18, 2015 · 8 min read

On October 10, 1996, a program called EQP solved a problem that had bothered mathematicians since 1933, almost 60 years ago. Using a powerful algorithm, EQP constructed a proof of a theorem on its own. EQP ran for around 8 days and used about 30 Mb of memory.

The theorem was known as Robbins’ conjecture; a simple question about whether a set of equations is equivalent to the boolean algebra.

You read more background here or on the New York Times archives. You can also view the computer-generated proof here.

This article (or series of articles) will shed some light on how EQP worked: the fascinating world of automatic theorem proving.


Along the way, we’ll explore ideas that have spanned millennia: formal logic, Gödel’s (In)completeness theorem, parsing, Chapter 4.4 of The Structure and Interpretation of Computer Programs, P=NP?, type theory, middle school geometry, Prolog, and philosophy. Over time, we’ll build up the tools needed to express a powerful theorem-proving algorithm: Robinson’s Resolution.

This isn’t meant to be very notation-heavy. I’ve avoided using dense notation in favor of sentences, hoping it will be easier for people to jump right in.

Exercises are provided for you to think about interesting questions. Most of them can be answered without a pencil or paper, just some thought. Some of them are introductions to big ideas you can go explore on your own. So really, don’t think of the exercises as exercises, but as the Socratic part of the Socratic dialogue.

If you want a formal education in all this, consult Russell and Norvig’s textbook, Artificial Intelligence: A Modern Approach. All I’m writing here I learned on my adventures writing my own theorem prover, lovingly named Eddie.


To prove theorems, we really need to nail down what words like “prove” and “theorems” mean. This turns out to be kind of tricky, so the first post will be devoted to building up some formal logic: mainly vocabulary and big ideas.

Logic begins with propositions. Propositions are sentences that can be true or false, like “Charlie is a basketball player.” or “Two plus two is five.”.

Except, we don’t necessarily know whether a proposition is true or false. Do you know whether Charlie is a basketball player? I don’t.

Then again, there are certain propositions that everyone knows are true. Propositions like “Either it is raining or it is not raining.”.

How do we know that that’s always true? One way is to list out all possibilities and see what happens:

It is raining. It is not raining. Either it is raining or it is not raining.
True True True
False True True
True False True
False False True

This sort of listing is called a truth table, and it assigns a truth value for each proposition. Each row is called a model.

Statements like these are called tautologies, and they seem to be pretty meaningless (like “no news is no news” or “if you don’t get it, you don’t get it”). We’re also going to refer to them as valid statements.

A statement could also be never true. These are called invalid statements.

Finally, a statement could possibly be true: these are called satisfiable. A satisfiable sentence is a sentence that could be true if certain propositions are true and others are false. Here’s an example:

It is raining. Mark is strong. It is raining and Mark is strong.
True True True
True False False
False True False
False False False

As we get into more complicated stuff, it’ll be annoying to write out these propositions in full. Also, sometimes we will want to talk about an arbitrary proposition, like a variable. So, we’re going to use capital letters to denote propositions. For example, I might say “let P be the proposition ‘It is raining.’”. You can then replace all instances of P with that proposition.


So I cheated a bit above; I made up sentences like “It is raining and Mark is strong.” without talking about what “and” really means.

Logicians generally use three main ‘operators’: “and”, “or”, “not”. You can use them to modify or join propositions and form bigger propositions (called sentences or formulas or expressions).

We have symbols to denote these: we use ¬A for “not A”, A∧B for “A and B”, and A∨B for “A or B”. It’s easy to get those last two confused at first; a nice mnemonic is that ∨ looks like a branching tree, which relates to choice (“or”).

In practice, this lets you turn sentences like “Either I am dreaming, or the world is ending and we do not have much time left.” into something like:

D ∨ (W ∧ ¬ T)

Another operator is implication. We say “A implies B” if B is true whenever A is true. We denote this in symbols with “A ⇒ B”.

That last exercise was a little tricky. How did you know your answer was correct? The foolproof way is to write out a truth table for A and B. But, as you can imagine, that gets tedious as you add more and more propositions.

And it gets worse. What if you have an infinite number of propositions? Like “1 is a positive number” and “2 is a positive number” and so on ad infinitum? Infinitely long truth tables sound gnarly. Clearly, we need a better way to deal with this.


The better way is to think in terms of rules of inference. Rules of inference are ways to transform expressions.

A rule of inference you’ve probably used is modus ponens, which states that if you have “P is true, and P implies Q” then you can deduce that “Q is true”.

For example, if Rover is a dog, then Rover is an animal. Since Rover is a dog, we can deduce that Rover is an animal.

Rules of inference are often written as fractions where the preconditions (also called antecedents or premises) are written as numerators and the results (also called consequents or conclusion) are written as denominators:

\[ \frac{P,\; P\implies Q}{Q} \]

Note that sometimes we elide the “and” from a series of premises.

A rule of inference is valid if its conclusions are true in all models where its premises are true.

A rule of inference should also be consistent, which means you shouldn’t ever be able to use that rule of inference to prove both A and not-A.


Now, finally, we have the tools to talk about proofs. In a “logical system”, you pick certain axioms, which are propositions that seem true. Then you use your rules of inference to show that those axioms imply other exciting things. Theorems are propositions that you can prove in this way, and proofs are chains of these rules of inference. A statement that has a proof is a theorem.

One of the first logical systems was Euclid’s postulates. With a handful of simple axioms that anyone would agree with, Euclid built up all of geometry. While other philosophers (like Thales) had come up with the same results Euclid had centuries before him, Euclid put all that machinery on a solid, rigorous foundation. In a future episode, we might even go ahead and encode Euclid’s axioms in formal logic.

Meanwhile, in the early 20th century, a crisis was brewing. People were coming up with all sorts of messy paradoxical results. The biggest one was Russell’s Paradox, which went somewhat like this:

Imagine the set of all sets that don’t contain themselves. So, for example, the set {1} is in that set, while the set of all sets is not in that set (because it contains itself). Does this weird set contain itself?

With these messy loopholes popping up all over the place, Alfred North Whitehead and Bertrand Russell decided it was a good idea to take matters into their own hands and put math on solid foundations like Euclid did centuries before.

The result was the Principia Mathematica*, a treatise that stated a set of axioms and rules of inference for set theory, and then built up arithmetic and the rest of math from that.

A proof from the Principia.

The Principia was careful to avoid any sets that contained themselves, and so Russell’s Paradox and the Liar’s Paradox could be avoided.

It seemed like a sweet deal, until Kurt Gödel came along and broke everything.

(*We stopped using the Principia, by the way. Most mathematicians use the axioms of Zermelo–Fraenkel set theory, abbreviated ZF. But it still has the problems Gödel discovered.)


Modus ponens isn’t a magic bullet.

This is really bad, if you think about it. It means that there are statements that can be true according to our axioms, but impossible to prove in our logical system. This is what is referred to as incompleteness—the subject of Gödel’s Incompleteness Theorem.

The Incompleteness Theorem says that if your logical system can be used to represent arithmetic, then you can come up with a statement that you can neither prove nor disprove.

This is horrible. It means that there are statements we can’t prove in math. There are questions without answers!

Here’s a question without an answer: those of you who have read Harry Potter and the Diagon(alization) Alley might remember that there are more real numbers than integers. The Continuum Hypothesis states that there isn’t a set of numbers that is “bigger” than the integers but “smaller” than the real numbers. It turns out that we can’t prove that. The Continuum Hypothesis is independent of the ZF axioms.

If that sounds a bit abstract, here’s another one: Euclid’s parallel postulate, which says something really obvious about parallel lines, turns out to be independent of his other axioms.

If a line segment intersects two straight lines forming two interior angles on the same side that sum to less than two right angles, then the two lines, if extended indefinitely, meet on that side on which the angles sum to less than two right angles.

Finally, the axiom of choice is pretty controversial. Most mathematicians accept it as true, even though it leads to all sorts of weird results. The weirdest one is the Banach-Tarski Paradox, which shows how you can take a sphere, cut it up into five pieces, and reassemble them to get two spheres.

When we talk about ZF with the axiom of choice, we call it ZFC.


As exciting as the Incompleteness Theorems are, there’s a much less celebrated result of Gödel’s called, oddly enough, the Completeness Theorem, published in 1929. It says that there is a rule of inference that is complete (every true statement is provable) for propositional logic.

Gödel did not, however, show what this rule was: we had to wait until 1965 for it. In the next post, we will discover this magical rule of inference, prove its completeness, and show how to use it to write an automatic theorem prover. And we’re going to find out why the title of this post makes sense.

◊ ◊ ◊