Introduction

To verify (or prove) theorems using a computer, we need some way to represent a theorem and the steps used to prove it. The Calculus of Constructions (CoC) forms the basis of several prominent proof assistants, such as Coq and Lean¹.

To understand theorem provers better, I am attempting to implement a simple theorem prover in Python. This post will detail the first part of that investigation².

Background

The Curry-Howard correspondence is an isomorphism between proofs and programs. This is a huge subject, but the takeaway is that you can map concepts from proofs and logics to concepts from type systems. For example, a logical formula can be represented by a type, and a proof could be represented by a term in that type system. If we have a type \(A\), and can construct a token \(a\) such that \(a\) is of type \(A\) (using a limited set of admissable rules for constructing said tokens), that’s the same thing as proving the theorem \(A\). This is convenient because computer scientists have developed tools and algorithms for type checking that computers can run, which will give us a higher level of assurance that our proof is correct than just working it out on paper³.

For the purposes of these posts I will start by mostly following Coq’s documentation (and the first chapter of the HoTT book)⁴.

Game Plan

We’re going to need to implement a few different requirements to obtain a working proof assistant.

We need a representation of the terms of our language. There are various different kinds of terms: types, props, and so forth. Our representation will need to be able to handle them all. We will also need to be able to represent global and local contexts (which basically maintain our assumptions and definitions).
We need to be able to tell when two terms are the same. To decide this, we will essentially have to recurse over the structure or the terms and compare at each level. We might also convert both terms to a “normal form” to compare them⁵.

Those first two steps will involve constructing data structures to represent and simplify terms. We will also need to implement the conversion rules to convert terms to different forms.

We need to be able to judge if a global environment is “well-formed” (i.e. we didn’t break any rules when constructing it), and a local context is valid in this global environment. In the Coq documentation, given a global environment \(E\) and a local context \(\Gamma\), this judgement is written \(W(E)[\Gamma]\).
We need to be able to judge if a term \(t\) is well-formed and has type \(T\) (in a given global environment and local context). The Coq documentation uses \(E\) for a global environment, \(\Gamma\) for a local context, and \(E[\Gamma] \vdash t:T\) to represent this judgement.

Requirements 3 and 4 entail implementing the typing rules. These intimidating-looking formulas represent the valid “moves” you make in a derivation. If you want to check if the bottom of each rule is true, the top of each rule must be true. Alternatively, we can think about “building up” proofs starting from the simplest rules.

Consider the simplest rule:

\[\frac{% % }{% W([])[]% } \tag{W-Empty} \]

This means that an empty global environment and an empty local context in that environment is well-formed. Alternatively, this means that we can construct an empty global environment and empty local context without any prerequisites. If we compose all the various rules together, we can derive terms of greater complexity (corresponding to more complex theorems and proofs).

We will need to implement inductive types, and automatically be able to infer the correct inductive hypotheses for a given inductive type.
Any actual proof that mathematicians actually care about will be enormous in this system. So we will probably want to implement a tactic language. This will let us deal with things at a higher level of abstraction, to make proving theorems easier.
Once we have all that, we’ll hopefully know enough to try to automatically search the space of proofs.

Term Representations

Let’s start by representing the language we will write terms in. In CoC, all terms have a type, even the terms that represent types. The types of types are called sorts. There’s an infinite hierarchy of sorts, based on Prop (propositions) and Set (the type of “small sets”). Since Prop and Set are terms, they have types as well.

Sorts

Our main Sorts are:

\[ \begin{align} Prop\text{ }\colon\text{ }&Type(0) \\ Set\text{ }\colon\text{ }&Type(0) \\ Type(i)\text{ }\colon\text{ }&Type(i+1) \end{align} \]

Let’s start a basic implementation in Python.

Here’s some basic constructs.

class Set:
    
    def __init__(self):
        pass
    
    def __repr__(self):
        return "Set"

class Prop:
    
    def __init__(self):
        pass
    
    def __repr__(self):
        return "Prop"

class Type:

    __match_args__ = ('n',) 
    def __init__(self, n=0):
        self.n = n 
    
    def __repr__(self):
        return f"Type({self.n})"
    
class Constant:
   
    __match_args__ = ('name',)
    def __init__(self, name: str):
        self.name = name
    
    def __repr__(self):
        return f'Constant({self.name})'

class Variable:
    
    __match_args__ = ('name',)
    def __init__(self, name: str):
        self.name = name
    
    def __repr__(self):
        return f'Var({self.name})'

Straightforward so far. The Sorts are mostly just symbols, with the exception of Type (which includes a number, for higher types⁶). There’s also SProp, “strict propositions” which I’m mostly ignoring for now but you may see floating around.

Constants and Variables also have names.

Note the use of __match_args__. These give classes the ability to do structured matching⁷, similar to the structured matching in Scala, Erlang, etc.

Complex Types

Now, let’s look at the more interesting examples, starting with Products and Functions:

class ProductType:
    
    __match_args__ = ('variable', 'term1', 'term2')
    def __init__(self, variable, term1, term2):
        self.variable = variable 
        self.term1 = term1 
        self.term2 = term2 
 
    ...

These are the so-called “product types.” If we have variable \(x\) of type \(T\), and \(x\) appears in \(U\), this is the dependent product type⁸. This can be thought of as the universal quantification (\(\forall t:T, U\)). If we don’t have that dependence, this is just the regular product type (“if \(T\) then \(U\)”). This will hopefully be more rigorous when we discuss the various rules for introducing and eliminating well-formed types.

class FunctionType:
    __match_args__ = ('variable', 'term1', 'term2')
    def __init__(self, variable, term1, term2):
        self.variable = variable
        self.term1 = term1 
        self.term2 = term2

    ...

Similarly, there’s function types. If we have a variable \(x\) and terms \(T\) and \(u\), then we have a term \(\lambda x:T.u\).

If you’re following along with the code, you may also want to overload the __repr__ dunder method for these guys, for readability.

class Application:
    __match_args__ = ('first', 'second')
    def __init__(self, first, second):
        self.first = first 
        self.second = second 
    def __repr__(self):
        return f'({self.first} {self.second})'

If we have two terms, we can “apply” one to the other. The Coq documentation says that if \(t\) and \(u\) are terms, then \((t\text{ }u)\) is a term (read that as “\(t\) applied to \(u\)”). Here we use first and second to represent the two terms.

class Let:
    __match_args__ = ('variable', 'term1', 'term2', 'term3')
    def __init__(self, variable, term1, term2, term3):
        self.variable = variable
        self.term1 = term1 
        self.term2 = term2
        self.term3 = term3

Last is Let. This is a “local binding” operation. You set a variable to have a local definition. The Coq documentation renders this as \(\text{let }x = t:T \text{ in } u\). The first argument is the variable (\(x\)). The three terms will represent \(t\), \(T\), and \(u\), respectively.

Summary

Sort = Set | Prop | Type 
Term = Sort | Constant | Variable | ProductType | FunctionType | Application | Let

This is the majority of the different sorts of terms we will need for our theorem prover.

Contexts

These context classes are pretty bare-bones. They just wrap up dictionaries, letting us track terms. I’m going to keep assumed, defined, and derived terms in separate dictionaries⁹.

class LocalContext:
    def __init__(self, name: str, assumed_terms=None, defined_terms=None, derived_terms=None):
        self.name = name
        self.assumed_terms = assumed_terms if assumed_terms else {}
        self.defined_terms = defined_terms if defined_terms else {}
        self.derived_terms = derived_terms if derived_terms else {}
    
    def body(self):
        return {**self.assumed_terms, **self.defined_terms, **self.derived_terms}

class GlobalEnvironment:
    def __init__(self, name: str, assumed_terms=None, defined_terms=None, derived_terms=None):
        self.name = name
        self.assumed_terms = assumed_terms if assumed_terms else {}
        self.defined_terms = defined_terms if defined_terms else {}
        self.derived_terms = derived_terms if derived_terms else {}
    
    def body(self):
        return {**self.assumed_terms, **self.defined_terms, **self.derived_terms}

class Environment:
    def __init__(self, globalEnvironment: GlobalEnvironment, localContext: LocalContext):
        self.globalEnvironment = globalEnvironment
        self.localContext = localContext

Example

Here’s a compound term we can now construct:

x = Variable("x")
y = Variable("y")
z = Variable("z")

# Create a type representing: ∀x:Set. ∀y:(x->Type). ∀z:Type. (y z)
nested_prod = ProductType(
    x, Set(),
    ProductType(
        y, ProductType(Variable("x"), Type(), Type()),
        ProductType(
            z, Type(),
            Application(y, z)
        )
    )
)

Next Steps

In this post, we managed to knock step one off the game plan. Not much of the real meat yet: this is mostly just creating classes.

In the next post, I’ll start on some (but not all) of the conversion rules.

The main source is the Coq documentation.
Also, the HoTT Book
Software Foundations

I found some other interesting projects trying to control theorem provers from Python.

CoqPyt
PyCoq ¹⁰
I also found this Lean Dojo, which I may look into later. Note that it’s Lean, not Coq.

Footnotes

There are other choices. For example, Agda uses a type theory based on Martin-Lof Type Theory, rather than on CoC.↩︎
If you’re reading this, it may be helpful to prove some theorems in a proof assistant to understand this subject a bit better and to motivate what I’m trying to replicate a bit better. So far, I’ve been through a few different sources in an attempt to understand this subject better. Benjamin Pierce’s Software Foundations books were far and away the most helpful resource, but you can also try Lean for an alternate system, or the first chapter of the HoTT book for a mathematical introduction. My goal here is to better understand these systems by actually implementing a (limited-scope) proof assistant/theorem prover, so I will be glossing over some of the theory in favor of engineering.↩︎
There’s an intimidating amount of literature on this subject. I’ll try to break off manageable chunks of it. For now, there’s two important concepts to be aware of. The first key concept is that these systems are built on intuitionist logic. Basically: to prove the existence of an object we need to actually show an example. In particular, by default we lack the law of the excluded middle and double negation (this isn’t actually a problem in practice, but it’s out of scope for this footnote.) The second concept is that of dependent types. Dependent sum and dependent product types extend the better-known sum and product types: terms of these types can depend on certain values. They let us represent quantifiers like “for all” and “there exists”.↩︎
Note that Coq uses the Calculus of Inductive Constructions. I’ll deal with induction in a later post.↩︎
Convertibility is with respect to a given context.↩︎
Since I’m writing code, and not proving theorems, I’ve chosen to start my type indexing from 0, rather than 1. Hopefully, this doesn’t cause trouble later. If it does, I’ll come back and change this.↩︎
Provided you are using Python 3.10 or greater.↩︎
This confused me at first, but the HoTT book describes \(\Sigma\)-types, which represent dependent-pairs (corresponding to existential quantification \(\exists\)). Where are they? It seems that they are not a primitive concept in Coq, they are implemented using inductive types.↩︎
I’ll change this if it turns out to be a dumb move.↩︎
Shouldn’t a half-Coq, half-Python project be called Basilisk?↩︎