Types and Type Theory
August 3rd, 2009
2This is the first article in a multi-part series on the subject of types and type theory.
Programmers are all familiar with types. Usually, their first experience with types comes from variable declarations in a language like Java or C, which look like the examples in (1).
a) int x;
b) char y;
c) bool z;
In each of these examples, we are doing two things:
First we are saying that the symbols ‘x’, ‘y’, and ‘z’ should be recognized by the system (either a compiler or an interpreter) as variable names, so that when their name occurs in a statement their value should be substituted.
Secondly, we are restricting the sorts of things that the variable can refer to. In the examples above, the variable ‘x’ can only refer to integers, ‘y’ can only refer to characters, and ‘z’ can only refer to a boolean value.
Trying to assign a string to the variable ‘x’, say by executing (2), will result in a “type error”.
2) x = “cats”;
Similarly, an operation that only makes sense for st;rings or characters, like “to_upper_case”, will fail when its argument is a variable of type int or bool. Thus, Java and C programmers learn early on that they need to keep track of the types of their variables.
Functional programming relies more heavily on the theory of types, so some programmers are familiar with functional types, like string->int, which is the type of a function that takes a string argument and returns an integer, or (bool->bool)->int, a function that takes a function of type bool->bool as its argument and returns an integer. Functional programmers need to keep track of their variable and function types very closely, and type errors are extremely common.
So if types are such a pain, why do we have them?
Well, consider the problem that a computer programmer faces every day. He or she writes a piece of code to perform a certain task or solve a certain problem. Ideally this code will do what it’s supposed to, but usually it will have bugs or unwanted side-effects. The programmer’s job then turns to finding and removing these bugs. This task almost always takes longer and is far more frustrating than writing the original piece of code.
In a perfect world, it would be possible to tell immediately whether the programmer’s code does what it is supposed to. Imagine a program called the CorrectnessChecker which takes a piece of code and a suitably defined task as input, and outputs true if and only if (iff) the code performs the task exactly as described, otherwise it outputs false. The world would be a better place. If the CorrectnessChecker said that a program did what it was supposed to, it could be put to use immediately. There would be no need to spend time and money on bug-testing and quality assurance.
Unfortunately, the CorrectnessChecker does not exist, and in fact it cannot exist. This sad fact has been known for about 50 years. The proof stems from the fact that if we had a CorrectnessChecker, then we could build a program that solves the Halting Problem, and this program similarly cannot exist. (A self-contained proof that the CorrectnessChecker is impossible comes from considering what happens when we run the CorrectnessChecker with its own source code as input.)
Thankfully for us programmers, all it not lost, as we can closely approximate the utility of the CorrectnessChecker through something called the TypeChecker. The TypeChecker is a program that takes a piece of code as input and returns true iff the code does not exhibit certain kinds of behavior. Usually we use the TypeChecker to weed out common programming errors, like trying to add true to “cat”, applying to_upper_case to 5, or accessing the 10th element in an array with only 5 elements.
However, the ability to detect code with such errors comes at a small cost:
All statements in the code must be annotated with a description of what kinds of values they can yield.
This description is called a type.
So, for example, when we say that a variable ‘x’ has type int, we mean that anywhere we see the variable ‘x’ in the code, we can expect that the value of that variable is an integer. Or, say we know that the function to_upper_case is of type string->string, and we have a string “cat”. Then we know that the value of to_upper_case applied to the string “cat” will itself be a string. Given a piece of code annotated with types, the TypeChecker can detect cases where the code does nonsensical things like adding an integer and a string, or applying to_upper_case to a boolean.
Although the TypeChecker needs every statement annotated with its type, we as programmers do not need to spend time annotating every statement we write. Instead, we can annotate just a few things (like variables and functions) and use a TypeInference program to annotate all the statements based on the initial types we specified and some general inference rules about how types can be combined.
Although type checking is almost universally regarded as useful, individual programming languages differ in how they make use of their type checker. One way to categorize programming languages is whether they are ‘staticly’ or ‘dynamically’ typed. A ‘statically typed’ language is one in which the type checker is run before the compiler or interpreter. If a type error is caught, it is returned to the programmer before any of the code is actually executed. These are languages like Java, C, and my beloved OCaml. Programming in these languages involves a lot of wrestling with the type checker, but once it is satisfied further bug testing is usually rather fast.
In a ‘dynamically typed’ language, the code is executed without the types being checked first. However, the types are carried around in memory, and at each execution step the types of the terms used in that step are checked and execution continues if they are OK. Javascript and Scheme are two dynamically typed languages. Development can sometimes move faster in a dynamically typed language because less time is spent trying to satisfy the type checker. However, because the types must be checked at every execution step, dynamically typed languages often run slower than statically typed languages.
Besides being practically useful to programmers, type theory is deeply connected to theoretical computer science, philosophy, and mathematics. Indeed, one of my favorite pieces of mathematics, the “Curry-Howard Isomorphism” shows that type theory and logic are equivalent. This has interesting mathematical implications, and in practice it allows logicians to use pre-existing type checking algorithms in computer-generated proofs.
In my posts on this topic, I hope to alternate the theoretical aspects of type theory and its practical uses. My next post will be a theoretical one on the subject of the simply-typed lambda calculus, the formal system which underlies almost every (useful) programming language in existence.
Tagged with: type theory, types
Related Posts
Author
2 Comments Leave a comment
A static type system built into a language will certainly limit the set of otherwise valid programs that you can write (whether this is a problem is debatable). Some work has been done to lift these restrictions in languages like Qi, which provides a Turi
Leave a comment
Allowed Tags
_emphasis_
*strong*
??citation??
-deleted text-
+inserted text+
^superscript^
~subscript~
@code@
Add code using a GIST
gist: gistid
A problem with statically typed languages is that the set of types is normally determined in advance by the system. In practice, however, types may be more complex than the system’s built-in notions. For example, in Scheme we have a “list” which is actual
Reply to comment