Programming Languages

Lecture Notes for CS4400/5400

Ferdinand Vesely

Lambda Calculus

Lambda calculus is a theory of functions. What is a function? There are two basic views one can take when characterizing them:

Function as a graph
Function as a value

Considering a function \(f\) as a graph is to consider it as a set of pairs – mappings between input and output values \((x, f(x))\). For example the square function on natural numbers \(^2 : \mathbb{N} \to \mathbb{N}\) can be characterized as a set of pairs \((n, n^2)\):

\[ \{ (0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), ... \} \]

Using a function as a graph is to find an output that corresponds to our input. The alternative view to take is to consider a function as rules – equations, which tell us how to compute the output of the function from its input. For example, the square function \(^2 : \mathbb{N} \to \mathbb{N}\) is defined by the equation:

\[ n^2 = n \times n \]

How do we use this function? We substitute an expression that looks like the left-hand side with the right-hand side, replacing the argument \(n\) with the expression and then computing the resulting expression. For example, our calculation might proceed as follows:

\[ \begin{aligned} {4^2}^2 + 3^2 &= (4 \times 4)^2 + 3^2\\ &= \left((4 \times 4) \times (4 \times 4)\right) + 3^2\\ &= \left((4 \times 4) \times (4 \times 4)\right) + (3 \times 3)\\ % &= \left(16 \times (4 \times 4)\right) + (3 \times 3)\\ % &= \left(16 \times (4 \times 4)\right) + 9\\ % &= \left(16 \times 16\right) + 9\\ % &= 256 + 9\\ &... \\ &= 265 \end{aligned} \]

Or, as follows: \[ \begin{aligned} {4^2}^2 + 3^2 &= (4 \times 4)^2 + 3^2\\ &= 16^2 + 3^2\\ &= 16^2 + 9\\ &= 256 + 9\\ &... \\ &= 265 \end{aligned} \]

In any case, the important thing to note is that we replace any occurrence of \(n^2\) for any \(n\) using the defining equation. In general, if we define a function \(f\) by the equation \(f(x) = E\), where \(E\) is some mathematical expression (potentially containing \(x\)), then we use (apply) this function by replacing any occurrence of \(f(D)\) (where \(D\) is a mathematical expression) by \(E[x := D]\), that is the expression \(E\) where all occurrences of \(x\) are replaced by \(D\). This is called substitution of a variable \(x\) in an expression \(E\) for another expression \(D\). E.g., if

\[ f(x) = x + x \]

then:

\[ \begin{aligned} f(20) + f(2 \times 3) &= (x + x)[x := 20] + (x + x)[x := 2 \times 3] \\ &= (20 + 20) + ((2 \times 3) + (2 \times 3)) \\ ... \end{aligned} \] The next question is, how important is the name of the function? We use names as mnemonics, so that we can say we can

say “let \(f\) be the function defined by the equation \(f(x) = E\)” (where \(E\) is an arbitrary mathematical expression), and
replace any occurrence of \(f\) applied to an argument with an instance of \(E\) where \(x\) is replaced with the argument expression.

We can do this without inventing names, by using functions as anonymous objects – just like we easily use numbers or strings or arrays. In mathematics an anonymous function will be written as \(x \mapsto E\). For example, the square function is \(x \mapsto x \times x\), the above function \(f\) is \(x \mapsto x + x\).

The above exposition applies to programming too. Basically, all sensible “higher-level” programming languages allow us to define functions to abstract a computation by replacing a concrete expression with a variable – a placeholder. In Python we might write:

def square(x):
  return x * x

In C/C++:

int square(int x) {
  return x * x; 
}

In Haskell:

square :: Integer -> Integer
square x = x * x

In Scheme:

(define (square x)
  (* x x))

In any programming language we operate with the rough understanding that whenever square is invoked with an argument, that application might as well be replaced with the body of the function with the argument variable replaced with the actual argument (either before or after evaluating the argument itself). More and more programming languages, particularly those which allow passing functions as arguments, allow creating functions without naming them – so called anonymous functions. Python and Scheme have lambda:

lambda x : x * x

(lambda (x) (* x x))

OCaml has fun or function:

fun x => x * x

Haskell has the backslash:

\x -> x * x

C++ has, well…

[](int x){ return x * x; }

As hinted by the Scheme and Python examples, Lambda calculus is the underlying theory behind these anonymous functions. In its pure form, it is exclusively concerned with what it means to apply an abstracted expression (as an anonymous function), to an argument. It studies this as a purely syntactic operation.

Where Python and Scheme have lambda, OCaml has fun and function, Lambda calculus has \(\lambda\). That is an anonymous function with the formal parameter \(x\) is constructed using \(\lambda x...\) We can write the squaring function in lambda notation as

\[\lambda x.\ x \times x\]

We say that this is a lambda abstraction that binds the variable \(x\) in \(x \times x\). In other words, \(x\) is bound in \(x \times x\). An application is written (similarly to Scheme, OCaml, or Haskell) by writing the function and argument next to each other (juxtaposition). For example, where in Scheme we could write

((lambda (x) (* x x)) 10)

and in Haskell

(\x -> x * x) 10

In lambda notation we write:

\[ (\lambda x.\ x \times x)\ 10 \]

As I mentioned before, Lambda calculus looks at the application of a function as a syntactic operation, in terms of substitution, as the process of replacing any occurrence of the abstracted variable with the actual argument. For the above, this is replacing any occurrence of \(x\) in \(x \times x\) with \(10\):

\[ \begin{aligned} (\lambda x.\ x \times x)\ 10 &= (x \times x)[x := 10]\\ &= 10 \times 10 \end{aligned} \]

Another way of thinking about the bound variable \(x\) in the \(\lambda x.\ x \times x\) as a placeholder or hole, where the argument “fits”.

\[ (\lambda \boxed{\phantom{x}}.\ \boxed{\phantom{x}} \times \boxed{\phantom{x}})\ 10 = \boxed{10} \times \boxed{10} \]

Pure Lambda Calculus

Here, we will look at the formal theory of pure Lambda Calculus. We will look at the syntax and a notion of computation.

Syntax

The basic syntax of the calculus is really simple:

  <Lambda> ::= <Variable>
             | ( <Lambda> <Lambda> )
             | ( λ <Variable> . <Lambda> )

That is all there really is:¹

variable reference, e.g. \(x\), \(y\), \(z\), \(a\), \(\mathit{square}\)
application, e.g., \((x\ y)\), \(((\lambda x.\ x)\ (\lambda x.\ x))\)
lambda abstraction, e.g.,
- \((\lambda x.\ x)\) – expressing the identity function
- \((\lambda x.\ x\ x)\) – a function that applies its argument to itself

You might ask: what can we do with such a minuscule language? Turns out a lot. As proven by A.M. Turing, this pure version of Lambda calculus is equivalent in computational power to Turing Machines. That means we are able to build up a programming language out of these three constructs. We will look at how to do that in the section on Programming in Pure Lambda Calculus below.

Syntax Conventions and Terminology

Terminology: Take a lambda abstraction:

\[ (\lambda x.\ N) \]

\(\lambda x\) is a binder binding \(x\)
\(N\) is the body of the abstraction

To avoid writing too many parentheses, these conventions are usually taken for granted:

Outermost parentheses are usually dropped: x x, λx. x.
Application associates to the left. That is, (((a b) c) d) is the same as ((a b c) d) is the same as (a b c d), which is the same as a b c d (see previous rule).
Lambda abstraction bodies extend as far to the right as possible. That is, (λa. (λb. ((a b) c))) is the same as λa. λb a b c.

Beta Reduction

Computation in pure lambda calculus is expressed in a single rule: the \(\beta\)-reduction rule:

\[ (\lambda x.\ M)\ N \longrightarrow_\beta M[x := N] \]

The long arrow stands for “reduces to”. On the left-hand side, we have an application of a lambda abstraction to an arbitrary term. On the right-hand side, we substitute the abstraction’s bound variable with the argument. A term that matches the pattern on the left-hand side (that is, a lambda abstraction applied to something) is called a redex, short for reducible expression. For example:

(λx. x) a -->β a
(λx. x x) (λy. y) -->β (λy. y) (λy. y)
the above reduces further: (λy. y) (λy. y) -->β (λy. y)
not a redex: (x x)
also not a redex: x (λy. y)
also not a redex: (λy. (λx. x) y), although it does contain a redex ((λx. x) y)

Variables: Bound, Free. Closed Expressions

We have already mentioned the notion of a bound variable. A variable is said to be bound in an expression, if it appears under a λ-abstraction binding that particular variable. Or, in other words, it is bound if it appears in the scope of a binder. For example:

x is bound in (λx. x x) – it appears in the scope of the binder λx
both x and y are bound in (λx. λy. x y) – x appears in the scope of λx, y in the scope of λy
x is not bound in (λy. x y), but y is – x does not appear in the scope of any binder here, while y appears in the scope of λy

A free variable is one which appears in a position where it is not bound. For example:

x is free in x x, in λy. x y, or in (λy. y y) x
x is not free in (λx. x x) (λx. x)
x is both bound and free in (λx. x y) x, while y is only free

As you can see above, a variable might be both bound and free in an a expression.

An expression which contains no free variables is closed, for example:

λx. x
λx. λy. x y x

A closed lambda expression is also called a combinator.

A variable is called fresh for an expression, if it does not appear free in that expression. For example, x is fresh for y z or (λx. x x).

Names of Bound Variables Don’t Matter

Intuitively, an identity function should be an identity function, no matter what we choose to name its bound variable. That is, (λx. x) should be considered the same as (λy. y) or (λz. z). This is captured in the notion of alpha equivalence: two expressions are α-equivalent, if they only differ in the names of their bound variables. This also means, that we are free to α-convert any lambda term by consistently renaming bound variables. However, the new names must differ from free variables under the particular binder. We are thus free to convert (λx. x) to, e.g., (λa. a); (λy. z y) to (λx. z x), but not to (λz. z z).

Substitution

We were happy to use substitution in an informal manner up until now:

\(M[x := N]\) means replacing occurrences of the variable \(x\) in the expression \(M\) with the expression \(N\).

Here we want to pin it down. For that, we will need to consider the difference between bound and free variables. Let’s try to start with a naive definition of substitution.

Naive Substitution

There are three syntactic forms, we need to consider each form:

Variable: x[y := N] = ?

Application: (M1 M2)[y := N] = ?

Abstraction: (λx. M)[y := N] = ?

Variables are straightforward: we either find the variable to be substituted or we find a different one:

y[y := N] = N
x[y := N] = x if x \(\neq\) y

Application is also relatively simple – we simply substitute in both left-hand and right-hand side:

(M1 M2)[y := N] = (M1[y := N] M2[y := N])

Now, for a lambda abstraction we need to consider the variables involved. We certainly don’t want to override the bound variable of a function:

(λy. M)[y := N] = (λy. M)

The remaining case seems simple enough too:

(λx. M)[y := N] = (λx. M[y := N]) if x \(\neq\) y

If we test this substitution everything seems to be okay:

  (x x)[x := (λy. y)] = (x[x := (λy. y)] x[x := (λy. y)])
                      = (λy. y) (λy. y)

  ((λx. x y) x)[x := (λy. y)] = (λx. x y)[x := (λy. y)] x[x := (λy. y)]
                              = (λx. x y) (λy. y)

However, what happens if the expression that we are substituting contains the bound variable?

  (λy. x)[x := y] = λy. y

We see that in this case, we have just “captured” variable y and changed its status from free to bound. This changes the meaning of a variable – whereas the original meaning of y was given by the context of the left-hand side expression, now it is given by the binder λy. In particular, we changed a constant function—which, after the substitution should return a free y, no matter what argument it is applied to—to an identity function, that just returns whatever its argument is.

From this we see that we need substitution to behave differently when there the expression that we are trying to substitute, contains free variables that clash with variables bound by a lambda-abstraction.

Safe Substitution

To fix this we can restrict when the last case of our substitution applies:

(λx. M)[y := N] = (λx. M[y := N]) if x \(\neq\) y and if x is not free in N

Now our substitution is “safe”. However, this turns it into a partial function – it is left undefined for cases where the bound variable x appears free in N. To go around this, we can make use of alpha-conversion: we consistently rename the bound variable x to one that doesn’t clash with y or the free variables in N or M. Only then do we perform the actual substitution of y.

(λx. M)[y := N] = (λx'. M[x := x'][y := N]) if x \(\neq\) y and x' is fresh for y, N and M

Now substitution is a total function again. For an implementation, we just need to know how to pick a fresh variable. Notice how we replace the bound variable x with x' and also rename any ocurrence of x to x' in the body M. Since x' is chosen so that it does not appear free in M or N, we are avoiding any potential clashes.

Reduction Strategies

Beta reduction tells us how to reduce a redex. The missing piece of the puzzle is how to decide where to look for a redex and apply the beta-reduction rule. This is given by reduction strategies.

(The following text is taken, with minor modifications, from Types and Programming Languages)

Full Beta-Reduction

Under this strategy, any redex may be reduced at any time. At each step we pick some redex, anywhere inside the term we are evaluating, and reduce it. For example, consider the term:

(λa. a) ((λb. b) (λz. (λc. c) z))

This term contains three redexes:

(λa. a) ((λb. b) (λz. (λc. c) z))
(λb. b) (λz. (λc. c) z)
(λc. c) z

Under full beta-reduction, we might choose, for example, to begin with the innermost redex, then do the one in the middle, then the outermost:

U+10FC74

(λa. a) ((λb. b) (λz. (λc. c) z))
  --> (λa. a) ((λb. b) (λz. z))
  --> (λa. a) (λz. z)
  --> λz. z

λz. z cannot be reduced any further and is a normal form.

Note, that under full beta-reduction, each reduction step can have more than possible one result, depending on which redex is chosen.

Normal Order

Under normal order, the leftmost, outermost redex is always reduced first. Our example would be reduced as follows:

(λa. a) ((λb. b) (λz. (λc. c) z))
  --> (λb. b) (λz. (λc. c) z)
  --> λz. (λc. c) z
  --> λz. z

Again, λz. z is the normal form and cannot be reduced further.

Because each redex is chosen in a deterministic manner, each reduction step has one possible result – reduction thus becomes a (partial) function.

Call by Name

Call by name puts more restrictions on which redexes are fair game, and disallows reductions inside abstractions. For our example, we perform the same reduction steps as normal form, but stop short of “going under” the last abstraction.

(λa. a) ((λb. b) (λz. (λc. c) z))
  --> (λb. b) (λz. (λc. c) z)
  --> λz. (λc. c) z

Haskell uses an optimization of call by name, called call by need or lazy evaluation. Under call by name based strategies, arguments are only evaluated if they are needed.

Call by Value

Under call by value, only outermost redexes are reduced and each redex is only reduce after its right-hand side has been fully reduced to a normal form.

(λa. a) ((λb. b) (λz. (λc. c) z))
  --> (λa. a) (λz. (λc. c) z)
  --> λz. (λc. c) z

Evaluation strategies based on call by value are used by the majority of languages: an argument expression is evaluated to a value before it is passed into the function as an argument. Such a strategy is also called strict, because it strictly evaluates all arguments, regardless of whether they are used.

Programming in Pure Lambda Calculus

Multiple Arguments

So far, we have looked at lambda abstractions which only take a single argument. However, unary functions are only a small part of our experience with programming. We use functions with multiple arguments all the time. How do we pass more than one argument to a lambda?

One approach would be to extend the calculus with a notion of tuples. Perhaps throw in some pattern matching, for good measure:

\[ (\lambda (x, y).\ x\ y)\ (a, b) \]

However, this means that we are abandoning the very minimal core lambda calculus with all its simplicity. And we don’t have to! As we know well by now, applying an abstraction simply replaces its bound variable with the argument that it’s applied to, as in this trivial example:

\[ (\lambda x. x\ y)\ b \longrightarrow (x\ y)[x := b] = (b\ y) \]

What happens if the abstraction actually just returns another abstraction.

\[ \begin{aligned} (\lambda x.\ (\lambda y.\ x\ y))\ b \longrightarrow (\lambda y.\ x\ y)[x := b] = (\lambda y.\ b\ y) \end{aligned} \]

Since neither of the bound variable of the inner abstraction (\(y\)) and the variable we are substituting for (\(x\)), nor the bound variable of the inner abstraction (\(y\)) and the term we are substituting (\(b\)) are in conflict, we simply substitute \(x\) for \(b\) inside the inner abstraction. This yields an abstraction which can be applied to another argument. That is applying \((\lambda x.\ (\lambda y.\ x\ y))\) to \(b\) returned an abstraction which is “hungry” for another argument. We can now apply that abstraction to another argument:

\[ (\lambda y.\ b\ y)\ a \longrightarrow (b\ y)[y := a] = b\ a \]

Let’s do the same in one expression:

\[ \begin{aligned} (((\lambda x.\ (\lambda y.\ x\ y))\ b)\ a &\longrightarrow ((\lambda y.\ x\ y)[x := b])\ a \\ &= (\lambda y.\ b\ y)\ a \\ &\longrightarrow (b\ y)[y := a]\\ &= (b\ a) \end{aligned} \]

We just applied an abstraction to two arguments. To make this a little easier to see, we can use left-associativity of application and the fact that the scope of a binder goes as far right as possible to rewrite the original expression as

\[ (\lambda x.\ \lambda y.\ x\ y)\ b\ a \]

This technique is called currying (after Haskell Curry, although he was not the first one to come up with it). It is so common that, usually a short-hand is introduced for abstractions with more than one argument:

\[ \begin{aligned} (\lambda x\ y.\ ...) &\equiv (\lambda x.\ \lambda y.\ ...)\\ (\lambda x\ y\ z.\ ...) &\equiv (\lambda x.\ \lambda y.\ \lambda z.\ ...)\\ \text{etc.} \end{aligned} \]

If we allow arithmetic in our lambda expressions a nice example will be:

\[ \begin{aligned} \left(\lambda x\ y.\ \frac{x + y}{y}\right) 4\ 2 &\longrightarrow \left(\lambda y.\ \frac{4 + y}{y}\right) 2 \\ &\longrightarrow \frac{4 + 2}{2} \end{aligned} \]

Currying is used as the default for functions of multiple arguments by Haskell and OCaml (determined mostly by their standard libraries). On the other hand, Standard ML’s library uses tuples as default.

Data types

We see that we can represent functions with multiple arguments in PLC. Surely, for representing other kinds of data (such as booleans, numbers, data structures), we need to introduce extensions and add these as primitive operations? Not really…

Booleans

Many types of values can be represented using Church encodings. Booleans are probably the simplest and most straightforward:

\[ \begin{aligned} \mathsf{true} &= \lambda t\ f.\ t &\qquad&(= \lambda t.\ \lambda f.\ t)\\ \mathsf{false} &= \lambda t\ f.\ f & &(= \lambda t.\ \lambda f.\ f)\\ \end{aligned} \]

What do these mean? The representation of \(\mathsf{true}\) is a function that takes two arguments and returns the first one. On the other hand, \(\mathsf{false}\) returns its second argument. To make sense of these, we need to put them to work and see how they work with boolean operations.

We start with the conditional: \(\textsf{if-else}\). It should take three arguments and return its second one if the first one evaluates to \(\textsf{true}\), and its third argument otherwise. That is we are looking for an expression:

\[ \textsf{if-then}\ \textsf{true}\ x\ y \longrightarrow ... \longrightarrow x \]

and

\[ \textsf{if-then}\ \textsf{false}\ x\ y \longrightarrow ... \longrightarrow y \]

Notice something?

\[ \begin{aligned} \textsf{true}\ x\ y &\longrightarrow x\\ \textsf{false}\ x\ y &\longrightarrow y \end{aligned} \]

That means that all \(\textsf{if-then}\) needs to do is to apply its first argument to its second and third argument, since the boolean representation takes care of the selection itself:

\[ \textsf{if-then} = \lambda b\ t\ f.\ b\ t\ f \]

What about boolean operations?

Let’s try to look at conjunction: and. We look for ??? to put in:

(λa b. ???) true true   --> ... --> true
(λa b. ???) true false  --> ... --> false
(λa b. ???) false true  --> ... --> false
(λa b. ???) false false --> ... --> false

First note that true true x --> true for any \(x\), so it seems that λa b. a b x could work if we find an appropriate \(x\):

(λa b. a b x) true true --> (λb. true b x) true --> true true x --> ... --> true

Now note that in all but the first case and should reduce to false. In the second case,

(λa b. a b x) true false --> ... --> true false x --> ... --> false

for any \(x\), so that still works. Now, how can we get false true x --> false? By taking \(x\) to be false:

(λa b. a b false) false true --> ... --> false true false --> ... --> false

The final case also works:

(λa b. a b false) false false --> ... --> false false false --> ... --> false

Hence \[ \textsf{and} = \lambda a\ b.\ a\ b\ \textsf{false} \]

Another way of thinking about the definition of and is to define it terms of if-then-else. E.g., in Haskell,

and :: Bool -> Bool -> Bool
and a b = if a then b else False

which just says that if the first argument is true then the result of and depends on the second one, and if its false the result will be false regardless of the second argument.

Based on this, we can express the and operation using if-else, which we defined above, and show that it is equivalent to the previous definition by simplifying it using normal order reduction:

and =   λa b. if-else a b false
    =   λa b. (λb t f. b t f) a b false
    --> λa b. (λt f. a t f) b false
    --> λa b. (λf. a b f) false
    --> λa b. a b false

Can you come up with a representation of \(\textsf{or}\)? \(\textsf{not}\)?

Pairs

Pairs can be encoded using an abstraction which “stores” its two arguments:

pair = λl r s. s l r

You can think of a s as a “chooser” function which either picks l or r. Selectors for the first and second element are then, respectively, defined as:

fst = λp. p (λl r. l)
snd = λp. p (λl r. r)

Take a look at the selector functions we pass to the pair representation. Are they familiar? (Hint: booleans)

Natural Numbers: Church Numerals

Natural numbers are Church-encoded as Church numerals:

zero = λs z. z
one = λs z. s z
two = λs z. s (s z)
three = λs z. s (s (s z))
...

A numeral for \(n\) can be understood as a function that takes some representation of a successor function and some representation of zero and applies the successor to zero \(n\) times.

How about operations on numerals? The successor of a numeral λs z... is computed by inserting one more application of s inside of the abstraction:

succ (λs z. z) --> ... --> λs z. s z
succ (λs z. s z) --> ... --> λs z. s (s z)
succ (λs z. s (s z)) --> ... --> λs z. s (s (s z))
...

We know that succ takes a numeral (which is an abstraction) and returns another numeral, which is again an abstraction:

succ = λn. (λs z. ...n...)

Taking λs z. z as an example input:

(λn. (λs z. ...n...)) (λs z. z) 
  --> (λs z. ...(λs z. z)...)) 
  --> (λs z. s z)

We see that we need to apply an extra s under λs z.:

(λs z. s ...(λs z. z)...) --> ... --> (λs z. s z)

To do this we need to “open” the abstraction representing 0. This can be achieved by passing the outer s and z as arguments. We achieve what we wanted.

(λs z. s ...(λs z. z) s z...) --> (λs z. s ...z...) = (λs z. s z)

Working backwards, we arrive at our successor function:

(λs z. s z) 
  <-- (λs z. s ((λs z. z) s z)) 
  <-- (λn. λs z. s (n s z)) (λs z. z)
  = succ (λs z. z)

Successor can be thus defined as:

succ = λn. (λs z. s (n s z)) = λn s z. s (n s z)

Once we have a successor operation, defining addition is quite simple if we keep in mind that a Church numeral \(m\) applies its first argument (s) to its second argument (z) \(m\) times:

plus = λm n. m succ n

Multiplication follows the same principle:

\[ m * n = \underbrace{n + (... n}_{m \text{ times}} + 0) \]

Hence:

times = λm n. m (plus n) zero

We can define subtraction via a predecessor function, which is surprisingly more tricky than the successor function. For a numeral λs z. s (s ... (s z)), the predecessor should return a numeral with one less s. One way of defining a predecessor is via a function that “counts” the number of s applications in a numeral, but also remembers the previous count, that is, one less than the total number of applications of s:

pred = λn. snd (n (λp. pair (succ (fst p)) (fst p)) (pair zero zero))

Here the numeral n (of which we want to compute the predecessor) is applied to two arguments:

The function (λp. pair (succ (fst p)) (fst p)). This function takes a pair (bound to p) containing two numerals. It returns a pair containing the successor of the first element of p, together with its original value. That means, everytime the function is applied to a pair containing numerals n and m, it returns a pair with numerals corresponding to n + 1 and n (m is discarded).
A pair containing two zeros: (pair zero zero).

Finally, the second element of the pair is returned – which contains the count of s applications, except for the last one.

Here is an example. We let f = (λp. pair (succ (fst p)) (fst p))

pred three 
    = (λn. snd (n f (pair zero zero))) (λs z. s (s (s z)))
  --> snd ((λs z. s (s (s z))) f (pair zero zero))
  --> snd ((λz. f (f (f z))) (pair zero zero))
  --> snd (f (f (f (pair zero zero))))
    = snd (f (f ((λp. pair (succ (fst p)) (fst p)) (pair zero zero))))
  --> snd (f (f (pair (succ (fst (pair zero zero))) (fst (pair zero zero)))))
  --> ...
  --> snd (f (f (pair (succ (fst zero)) (fst zero zero))))
  --> snd (f (f (pair (succ zero) zero)))
  --> ...
  --> snd (f (f (pair one zero)))
    = snd (f ((λp. pair (succ (fst p)) (fst p)) (pair one zero)))
  --> snd (f (pair (succ (fst (pair one zero))) (fst (pair one zero))))
  --> snd (f (pair (succ one) one))
  --> ...
  --> snd (f (pair two one))
  --> ...
  --> snd (pair (succ two) two)
  --> ...
  --> snd (pair three two)
  --> ...
  --> two

To subtract n from m, we need to take 1 away from m n times.

minus = λm n. n pred m

For completeness, an alternative predecessor definition is as follows (TODO: explain):

pred' = λn f x. n (λg h. h (g f)) (λu. x) (λu. u)

We can check if a variable is zero:

is-zero = λn.n (λx. false) true

We can define \(\leq\)

leq = λm n. is-zero (minus m n)

And we can define equality:

equal = λm n. and (leq m n) (leq n m)

Recursion

We have seen that we can define booleans, together with a conditional, and numbers, together with arithmetic operations in pure lambda calculus. However, to reach full Turing power, we lack one important ingredient: the ability to loop. To loop in a functional setting, we need the little brother of looping: self-reference.

To see that we can loop, let us look at a term, for which \(\beta\)-reduction never terminates in a normal form. This interesting term, called \(\Omega\), is defined as follows:

Ω = (λx. x x) (λx. x x)

We see that we have an abstraction which applies its argument to itself and which is applied to itself. How does reduction proceed?

(λx. x x) (λx. x x) --> (x x)[x := (λx. x x)] 
  = (λx. x x) (λx. x x) --> (x x)[x := (λx. x x)] 
  = (λx. x x) (λx. x x) --> (x x)[x := (λx. x x)] 
  ...

Immediately after the first reduction step, we are back where we started! Well, we see we can loop forever (diverge), but how is this useful?

In a programming language like OCaml, we are used to defining recursive functions which refer to themselves inside of their body:

let rec fact = fun n -> 
  if n = 0 then 1 else n * fact (n - 1)

How do we achieve this in lambda? While we have been freely using equations to define names for lambda expressions, these were just meta-definitions of names. That is, when we write

fact = λn. if-true (is-zero n) one (mult n (?fact? (pred n)))

we rely on our meta-language and our common understanding of it to replace any occurrence of ?fact? with the right-hand side of the above equation, as many times as needed. But this is not beta-reduction, that is we are not defining a recursive function as an object in lambda calculus. To get there, we can think of a recursive definition as follows: “Assuming we have a function to call in the recursive case, we can complete the definition”. In Haskell or OCaml, we can simply assume that we already have the function that we are defining. But what is really going on here, is that we can abstract the recursive call as an argument – which corresponds to saying “assuming we already have a function to call in the recursive case”:

fact = λf. λn. if-true (is-zero n) one (mult n (f (pred n)))

Now factorial does not refer to itself anymore, we just need to give it a function to call in the else branch. Easy:

fact = (λf. λn. if-true (is-zero n) one (mult n (f (pred n)))) (λn. if-true (is-zero n) one (mult n (f (pred n))))

Wait, but now what about f in the second case? Ah, no problem:

fact = (λf. λn. if-true (is-zero n) one (mult n (f (pred n)))) 
          ((λf. λn. if-true (is-zero n) one (mult n (f (pred n))))
            (λn. if-true (is-zero n) one (mult n (f (pred n)))))

This apparently won’t work… unless we have a way of supplying an argument for f as many times as it’s needed. That is, a way to allow the function reference itself whenever it needs to. This is where fixpoint combinators come in.

In math, a fixed point of a function \(f\) is an input for which the function returns the input itself:

\[ f(x) = x \]

If the above holds, we say that \(x\) is a fixed point of \(f\). A fixpoint combinator (in general called \(\operatorname{fix}\)) is an operation that computes the fixed point of a function. That is, it is a function for which the following equation holds:

fix f = f (fix f)

This equation just spells out that when a function is applied to its fixpoint, the fixpoint shall be returned. Let’s use the above equation on itself, by replacing occurrences of fix f with the right-hand side:

fix f = f (fix f)
      = f (f (fix f))
      = f (f (f (fix f)))
      = ...

Now glance above: “If only we had a way of supplying an argument for f as many times as it’s needed.” Seems we are onto something. Let’s replace f with our factorial:

fact = λf. λn. if-true (is-zero n) one (mult n (f (pred n)))
 
fix fact
  =   fact (fix fact)
  =   (λf. λn. if-true (is-zero n) one (mult n (f (pred n)))) (fix fact)
  --> (λn. if-true (is-zero n) one (mult n ((fix fact) (pred n))))

This looks promising. The problem? We haven’t defined what fix is, we are just abusing our meta-notation again. In fact, there is more than one possible definition of fix. The simplest one is the Y combinator:

Y = λf. (λx. f (x x)) (λx. f (x x))

Notice how the structure is very similar to \(\Omega\) above. We should check if it is a fixpoint combinator, that is, if it satisfies the fixpoint equation:

Y g = (λf. (λx. f (x x)) (λx. f (x x))) g
    = (λx. g (x x)) (λx. g (x x)))
    = g ((λx. g (x x)) (λx. g (x x)))
    = g ((λf. ((λx. f (x x)) (λx. f (x x)))) g)
    = g (Y g)

We have ourselves a fixpoint combinator. Let us try to use it to define our factorial function:

fact0 = (λf. λn. if-true (is-zero n) one (mult n (f (pred n))))
fact = Y fact0

What happens when we try to apply fact to a numeral?

fact three 
  =   Y fact0 three
  =   (λf. (λx. f (x x)) (λx. f (x x))) fact0 three
  --> (λx. fact0 (x x)) (λx. fact0 (x x)) three
  --> fact0 ((λx. fact0 (x x)) (λx. fact0 (x x))) three
  =   fact0 (Y fact0) three
  --> (λn. if-true (is-zero n) one (mult n ((Y fact0) (pred n)))) three
  --> if-true (is-zero three) one (mult three ((Y fact0) (pred three)))
  --> ...
  --> mult three ((Y fact0) (pred three))
  =   mult three (fact0 (Y fact0) (pred three))
  --> ...
  --> mult three (fact0 (Y fact0) (if-true (is-zero (pred three)) one (mult (pred three) ((Y fact0) (pred (pred three)))))
  ...
  -->

However, the \(Y\) combinator is not universally applicable under any reduction strategy. Consider what happens with the \(Y\) combinator, if we apply the CBV strategy.

Y g =   (λf. (λx. f (x x)) (λx. f (x x))) g
    --> (λx. g (x x)) (λx. g (x x))
    --> g ((λx. g (x x)) (λx. g (x x)))
    --> g (g (λx. g (x x)) (λx. g (x x)))
    --> g (g (g (λx. g (x x)) (λx. g (x x))))
    --> ...

For CBV, we need the Z combinator:

λf. (λx. f (λy. x x y)) (λx. f (λy. x x y))

Let bindings

The last useful notation to introduce are let-bindings. We have already implemented them as part of our arithmetic expressions language – both as a substitution-based and environment-based evaluator. Let bindings can be introduced to pure lambda-calculus as syntactic sugar – a construct that is defined by translation to a combination of other constructs in the language. Introducing a let-binging corresponds to creating a λ-abstraction and immediately applying it to the bound expression:

let x = e1 in e2     ≡     (λx. e2) e1

We have to define let as syntactic sugar – we cannot write it as a function, the way we did for if-then, add, etc. Why is that the case?

We can also define a recursive version of let – called let rec in OCaml, letrec in Scheme:

let rec f = e1 in e2   ≡   let f = fix (λf. e1) in e2
                       ≡   (λf. e2) (fix (λf. e1))

Where fix is an appropriate fixpoint combinator (e.g., \(Y\) under CBN, \(Z\) under CBV and CBN).

Most languages also allow specifying function arguments to the left-hand side of the equal sign:

let f x y z = e1 in e2
let rec f x y z = e1 in e2

(define (f x y z) e1)

These can be translated as:

let f x y z ... = e1 in e2   ≡   let f = λx y z. e1 in e2
                             ≡   (λf. e2) (λx y z. e1)

Extensions

While it is useful to show how various important programming concepts and constructs can be expressed in pure lambda-calculus directly, in general, it is a rather inefficient approach.

The approach usually taken in designing lambda-calculus-based languages, is to take the calculus as a core language and add extensions to support various kinds of data.

Pairs

<Lambda> ::= <Variable>
           | (<Lambda> <Lambda>)
           | (λ <Variable> . <Lambda>)
           | ( <Lambda> , <Lambda> )
           | fst <Lambda>
           | snd <Lambda>

Often just written informally as an extension of a previous BNF:

<Lambda> ::= ...
           | ( <Lambda> , <Lambda> )
           | fst <Lambda>
           | snd <Lambda>

Together with the extension to syntax, we need to specify what the meaning of these new construct is. That is, we need to update the reduction strategy and provide reduction rules for the pairs. For primitive operations, these reduction rules are sometimes called \(\delta\)-rules or \(\delta\)-reduction rules:

fst (l, r) -->δ l
snd (l, r) -->δ r

Reduction vs. Evaluation

Reducing to a Normal Form

So far, in connection with Lambda calculus, we have only talked about reduction as a transformation of an expression containing redexes into another expression where the redex has been reduced. To actually evaluate a lambda-expression, we can simply iterate the reduction step (with a particular strategy), until we reach a normal form: an expression that cannot be reduced further. Note that some expressions (such as \(\Omega\)) do not have a normal form – under any reduction strategy. Formally, an iteration of reduction steps \(\longrightarrow\) is written as \(\longrightarrow^{*}\), which stands for “reduces in zero or more steps to”. Mathematically, it is a reflexive-transitive closure of \(\longrightarrow\). It is defined by the following two rules:

M -->* M
M -->* M' there is a N, such that M --> N and N -->* M'

This can be lso expressed as:

an expression reduces in zero or more steps to itself
an expression \(M\) reduces in zero or more steps to the expression \(M'\), if there is an intermediate expression \(N\) and \(M\) reduces to \(N\) and \(N\) reduces in one or more steps to \(M'\)

In Haskell, this is expressed as iteration. Assuming that the one-step reduction function implementing a particular strategy has the type Lambda -> Maybe Lambda, we define an iteration function as:

iter :: (a -> Maybe a) -> a -> a
iter step e =
  case step e of
       Just e' -> iter step e'
       Nothing -> e

That is, while we can perform reduction steps on an expression, we do so. If the input expression cannot be reduced (it contains no redex), just return it.

If we have an implementation of a reduction strategy, say, stepNormal :: Lambda -> Maybe Lambda, then we can build a “reducer” by passing it as an argument to iter:

iterNormal :: Lambda -> Lambda
iterNormal = iter stepNormal

We will call such a reducer an iterative evaluator, or normalizer.

Evaluation

Instead of performing evaluation through gradual reduction (normalization) of lambda terms, we can write a recursive evaluator in the same style as we have done for our languages with arithmetic, booleans and let-bindings. This kind of evaluator takes an expression (term, expressing a computation) and either succeeds, returning a value, or fails. On the other hand, the iterative reduction function returned the same type of result as its input – an expression. The criterion for deciding when a lambda expression is reduced fully (evaluated) was that there was no further reduction possible – the expression has reached its normal form. That means, for the iterative evaluator, we didn’t have to worry about distinguishing values from expressions syntactically – once the evaluator finished reducing, we took the normal form and decided if it makes sense or not.

For a recursive evaluator of lambda terms, we need to decide what its return type should be. That is, we need to syntactically distinguish values from unevaluated expressions (computations). What are the values of pure lambda calculus?

A value is (a representation) of an entity, that has no computation left to perform and can only be inspected or used in another computation. It also does not depend on the context in which it appears – dependence on context would imply possible computation. For lambda calculus, we only have 3 syntactic forms, meaning we only have 3 candidates for what constitutes a value.

Application cannot be a value – it is a computation term, which can be, potentially, reduced. A single variable reference also cannot be a value – its meaning wholly depends on the context, in which it appears. The only option is lambda abstraction. In particular, an abstraction that is closed, i.e., does not contain any free variables. Having lambda-abstractions as values is consistent with Church encodings which we have explored above – each value is a closed abstraction containing only bound variables.

  <LambdaValue> ::= λ <Variable> . <Lambda>   -- constructor `VLam`

data LambdaValue = VLam Variable Lambda

As previously, we add values to the Lambda datatype:

data Lambda = Var Variable         -- <Lambda> ::= <Variable>
            | Lam Variable Lambda  --            | ( λ <Variable> . <Lambda> )
            | App Lambda Lambda    --            | ( <Lambda> <Lambda> )
            | Val LambdaValue      --            | <LambdaValue>

Now we can implement a call-by-name evaluator for lambda terms, relying on substitution to deal with application.

evalCBN :: Lambda -> Maybe LambdaValue
evalCBN (Var x) = Nothing   -- all variables should be substituted
evalCBN (Val v) = Just v
evalCBN (Lam x e) = Just (VLam x e)
evalCBN (App e1 e2) = 
  case evalCBN e1 of
       Just (VLam x e) -> evalCBN (subst x e2 e) -- leave argument unevaluated
       Nothing -> Nothing

For call-by-value, we use

evalCBV :: Lambda -> Maybe LambdaValue
evalCBV (Var x) = Nothing
evalCBV (Val v) = Just v
evalCBV (Lam x e) = Just (VLam x e)
evalCBV (App e1 e2) = 
  case evalCBV e1 of
       Just (VLam x e) -> 
         case evalCBV e2 of    -- evaluate the argument before substituting
              Just v2 -> evalCBV (subst x (Val v2) e)
              Nothing -> Nothing
       Nothing -> Nothing

What is the benefit of a recursive evaluator vs. reduction-based iterative evaluator?

Environment-based Evaluation

We can also use environments to implement a recursive evaluator for lambda calculus. Here is a basic implementation of a call-by-value one.

evalCBV :: Env LambdaValue -> Lambda -> Maybe LambdaValue
evalCBV env (Val v) = Just v
evalCBV env (Var x) = get x env
evalCBV env (Lam x e) = Just (VLam x e)
evalCBV env (App e1 e2) = 
  case evalCBV env e1 of
       Just (VLam x e) ->
         case evalCBV env e2 of
              Just v2 -> evalCBV (add x v2 env) e  -- bind the abstracted variable to the argument's value
              Nothing -> Nothing
       Nothing -> Nothing

Scoping

Our evaluator might seem ok at first sight. However, there is a problem.

Using our intuition from Haskell (and Racket), what shall we expect to be the value resulting from the expression below?

let x = 2 in
let f y = y + x in
let x = 3 in
f 1

In Haskell, this expression evaluates to 3.

Using our desugaring rule introduced earlier, this expands to the pure lambda calculus expression

(λx. (λf. (λx. f 1) 3) (λy. (+ y x))) 2

In Haskell, this is represented as (for a lambda extended with numbers and addition as primitives):

e = App (Lam "x"                                        -- let x = 2 in
          (App (Lam "f"                                 -- let f = ... in
                  (App (Lam "x"                         -- let x = 3 in
                         (App (Var "f") (Val (Num 1)))) -- f 1
                       (Val (Num 3))))
               (Lam "y" ((Add (Var "y") (Var "x"))))))
        (Val (Num 2))

What happens if we ask our evaluator to evaluate the above expression?

> evalCBV empty e
Just (Num 4)

What we implemented here is dynamic scoping. The value of a variable is determined by the most recently bound variable at runtime. In general, under dynamic scoping the value of a variable cannot be determined statically, that is, just by looking at the code. This is usually counterintuitive for us. We are used to lexical (or static) scoping, that is, the scope of a variable is given by its location in the source code of the program. Most programming language use lexical scoping, because this allows us to easily reason about the values of variables when we read source code. Examples of languages with dynamic scoping are: Lisp (Emacs), LaTeX, bash. Some languages allow choosing scope when defining a variables, such as Common Lisp or Perl. How do we ensure static scoping?

Closures

The problem arises because we are passing around a lambda value and the original association between its free variables and values is lost
We need a way to keep this association: closures

Closures package up a lambda abstraction with the environment that was valid at the time of evaluating the abstraction. They can be implemented simply as a pairing of a lambda abstraction with an environment:

data LambdaValue = Clo Variable Lambda (Env LambdaValue)

data Lambda = Var Variable
            | Lam Variable Lambda
            | App Lambda Lambda
            | Val LambdaValue

evalCBV :: Env LambdaValue -> Lambda -> Maybe LambdaValue
evalCBV env (Var x) = get x env
evalCBV env (Val v) = Just v
evalCBV env (Lam x e) = Just (Clo x e env) -- save the current environment
evalCBV env (App e1 e2) = 
  case evalCBV env e1 of
       Just (Clo x e env') ->              -- the closure remembers the corresponding environment
         case evalCBV env e2 of
              Just v2 -> evalCBV (add x v2 env') e  -- evaluate in the closure's environment updated with the argument binding 
              Nothing -> Nothing
       Nothing -> Nothing

Now all abstraction bodies are evaluated with the environment that belongs to them statically. And, indeed, the example expression evaluates as expected:

> evalCBV empty e
Just (Num 3)

De Bruijn Indices

As we have talked about before, the name of a bound variable is insignificant – as long as there is a consistency between the binder and the variable reference. This is called alpha-equivalence. Using strings as variables seems natural: it corresponds to the mathematical with variable names. However, as a computer representation this is inefficient: there are infinitely many ways to represent any lambda term: λx. x, λy. y, λz. z, λzzzz. zzzz, λlongervariablename. longervariablename, …

Moreover representing variable names as strings forces us to complicate the definition of substitution and define functions for obtaining fresh variable names. In an implementation, we would ideally want to do away with these complications. Sure, we could simply use natural numbers to represent variables and this would simplify picking a fresh variable name – e.g., by taking the maximum of all free variables and adding 1. We sill complicate the substitution definition and we still have the problem of multiple representations of alpha-equivalent lambda terms. There is another alternative.

When we look at a lambda abstraction:

\[ \lambda x.\ (\lambda y.\ x\ y\ (\lambda x.\ \lambda z.\ x\ z\ y)) \]

we really use the occurrence of \(x\) in the binder as a marker and a variable reference as a reference back to the marker, that is, each variable can be viewed as referring back to the binder that bound it:

      +-------------------+
      |                   |
      +----+   +------+   |
      |    |   |      |   |
λx. (λy. x y (λx. λz. x z y))
 |       |         |    |
 +-------+         +----+

This means that we do not really need to use names to know where an argument value should be inserted:

      +-------------------+
      |                   |
      +----+   +------+   |
      |    |   |      |   |
λ_. (λ_. * * (λ_. λ_. * * *))
 |       |         |    |
 +-------+         +----+

Now the question is, how do we represent these connections without using names. The Dutch mathematician Nicolaas Govert de Bruijn had the idea that each variable reference should represent the number of binders between it and the binder that bound it. If there is no other binder between the reference and its binder, the count 0 and we can refer to the binder by that number. If there is one binder between, we refer to the variable’s binder by 1, etc.

      +-------------------+
      |                   |
      +----+   +------+   |
      |    |   |      |   |
λ_. (λ_. 1 0 (λ_. λ_. 1 0 2))
 |       |         |    |
 +-------+         +----+

This leads to a simplification of the syntax: since we use do not need to mark binders using variables, lambdas do not carry any variable names:

    +-----------------+
    |                 |
    +----+  +-----+   |
    |    |  |     |   |
λ. (λ. 1 0 (λ. λ. 1 0 2))
|      |       |    |
+------+       +----+

Thus the syntax of Lambda expressions using de Bruijn indices is as follows:

<DLambda> ::= <Index>                -- variable reference
            | <DLambda> <DLambda>    -- application is as before
            | λ. <DLambda>           -- lambda abstraction does refer to the bound variable explicitly

Haskell:

data DLambda = DVar Integer
             | DApp DLambda DLambda
             | DLam DLambda

Here are a few more examples:

Any identity function (λx. x, λy. y, …) is λ. 0
λx. x x is λ. 0 0
Churchian for true and false is λ. λ. 1 and λ. λ. 0, respectively
Church numerals are λ. λ. 0, λ. λ. 1 0, λ. λ. 1 (1 0), λ. λ. 1 (1 (1 0))
The Y combinator, λf. (λx. f (x x)) (λx. f (x x)) is λ. (λ. 1 (0 0)) (λ. 1 (0 0))

What is the advantage of using de Bruijn indices? It certainly isn’t (human) readability. Maybe you have noticed, that for each alpha-equivalent term, there is only one representation. This is a major advantage when implementing lambda calculus, since we do not need to care about renaming of bound variables. Another advantage is that “environments” for evaluating lambda expressions are simplified – they are just stacks of values:

data DValue = DClo DLambda [DValue]

eval :: [DValue] -> DLambda -> Maybe DValue
eval env (DVar i) | i < length env = Just (env!!i)   -- lookup the value
                  | otherwise = Nothing
eval env (DApp e1 e2) = 
  case eval env e1 of
       Nothing -> Nothing
       Just (DClo e env') -> case eval env e2 of
                                  Nothing -> Nothing
                                  Just v2 -> eval (v2 : env') e
eval env (DLam e) = Just (DClo e env)

Imperative Features: Programming with State

The language features we have considered so far were pure in the sense that their evaluation didn’t yield any side-effect.[^Well, there was one possible side-effect: failure.]

Now we will look at features associated with imperative languages. In an imperative language computation proceeds by processing statements (or commands) which explicitly modify the machine state. These changes in state can me modifying memory, consuming input, producing output, etc.

Imperative Variables I

The first side-effect that we introduce is modifying memory via imperative variables. How do these differ from applicative variables? Probably the simplest way to think about the difference is that applicative variables bind to (stand for) values, imperative variables stand for memory cells. How do we model them?

The simplest approach: modelling memory – the store – as a mapping between variables and values, like we did with environments. The difference is in how we use them. While environments were used as read-only entities – the evaluator (at each stage) would receive an environment and might perform recursive calls with a modified environment, but it could pass a modified environment back to its context. A store, on the other hand, is treated as a read and write entity. At each stage, the evaluator can not just read what is stored in a variable, but also modify and return an updated store.

We will demonstrate using a small imperative language with basic arithmetic, assignment and retrieval of variables. To compose statements, we introduce a sequencing command: Seq s1 s2 first executes statement s1 and then s2. Notice how the execution of s2 uses an updated store produced by the execution of s1. Note that our expressions are still pure: they can read from the store, but not update it. This is apparent from the type of the expression evaluator evalExpr :: Store -> Expr -> Maybe Value. Contrast this with the type of the evaluator for statements:

data Stmt = Assign String Expr
          | Seq Stmt Stmt
          deriving Show

data Expr = Val Value
          | Var String
          | Add Expr Expr
          deriving Show

data Value = Num Integer
           deriving Show

type Store = Env Value

evalExpr :: Store -> Expr -> Maybe Value
evalExpr sto (Val v) = Just v
evalExpr sto (Var x) = get x sto
evalExpr sto (Add e1 e2) =
  case evalExpr sto e1 of
       Just (Num n1) -> case evalExpr sto e2 of
                             Just (Num n2) -> Just (Num (n1 + n2))
                             _ -> Nothing
       _ -> Nothing

execStmt :: (Store, Stmt) -> Maybe Store
execStmt (sto, Assign x e) = 
  case evalExpr sto e of
       Just v -> Just (add x v sto)
       Nothing -> Nothing
execStmt (sto, Seq c1 c2) = 
  case execStmt (sto, c1) of
       Just sto' -> execStmt (sto', c2)
       Nothing -> Nothing

Note, how the function execStmt does not return any value – the result of the computation is determined purely by the effect it had on the store.

Printing and Output Streams

We model output streams using a list of values.

data Stmt = ...
          | Print Expr
          deriving Show

data Expr = ...  -- same as before

data Value = ... -- as before

type Store = ... -- as before

type Out = [Value]

evalExpr :: Store -> Expr -> Maybe Value
evalExpr sto (Val v) = ... -- all cases as before

-- We need to handle the output stream. Notice the cases for Seq and Print
execStmt :: (Store, Stmt) -> Maybe (Store, Out)
execStmt (sto, Assign x e) = 
  case evalExpr sto e of
       Just v -> Just (add x v sto, [])
       Nothing -> Nothing
execStmt (sto, Seq c1 c2) = 
  case execStmt (sto, c1) of
       Just (sto', out1) -> 
         case execStmt (sto', c2) of
              Just (sto'', out2) -> Just (sto'', out1 ++ out2)
              Nothing -> Nothing
       Nothing -> Nothing
execStmt (sto, Print e) =
  case evalExpr sto e of
       Just v -> Just (sto, [v])
       Nothing -> Nothing

Loops

data Stmt = ... 
          | While Expr Stmt
          deriving Show

data Expr = ... 
          | Le Expr Expr  -- adding a boolean operation
          deriving Show

data Value = ...
           | Bool Bool    -- booleans to be used in the condition expression
           deriving Show

type Store = Map String Value

type Out = [Value]

evalExpr :: Store -> Expr -> Maybe Value
...
evalExpr sto (Le e1 e2) =
  case evalExpr sto e1 of
       Just (Num n1) -> case evalExpr sto e2 of
                             Just (Num n2) -> Just (Bool (n1 <= n2))
                             _ -> Nothing
       _ -> Nothing

execStmt :: (Store, Stmt) -> Maybe (Store, Out)
...
execStmt (sto, While e c) =
  case evalExpr sto e of
       Just (Bool False) -> Just (sto, [])
       Just (Bool True) -> 
         case execStmt (sto, c) of
              Just (sto', out1) ->
                case execStmt (sto', While e c) of
                     Just (sto'', out2) -> Just (sto'', out1 ++ out2)
                     _ -> Nothing
              _ -> Nothing
       _ -> Nothing

Imperative Variables II: Block Scope

Our first representation of memory was a direct mapping from names to stored values
Consequence:
- A variable defined inside, e.g., a while loop or a branch of an if statement, is visible and accessible outside of that loop, or branch
This is does not correspond to what happens in most common imperative languages

Languages like Python, C, Java, … enforce block scope, that is a variable is only visible in the block in which it was defined (or declared)

int x = 20;

do {
  int x = 10; // different variable - different storage cell
              // inside this block, shadows the outer x
  int y = 1;
} while (false);

// here, x is still 20
System.out.println(y); // error: y doesn't exist here

Block scope is similar to let expressions
Another issue: in languages like Python, or Java (for object types), the semantics of assignment is to set a reference – instead of copying the object stored in the memory location

E.g., the Python program

x = [1, 2, 3]
y = x
print(y)
y.append(4)
print(x)

prints

[1, 2, 3]
[1, 2, 3, 4]

because x and y refer to the same object

How do we combine block scope with imperative variables?

Block is captured well by (read-only) environments
A store captures variables which can be modified
Idea:
- Environment for mapping variables to “addresses”
- Store for mapping addresses to (maybe) values

type Address = Integer

type Store = Map Integer Value
type Env = Map String Integer

Implementation: we add a new construct Block which bundles variable declarations with statements that can use these variables. We keep our Assign construct, however, we’ll redefine its meaning (implementation). We will also change the semantics of variable references (Var)

data Stmt = Assign String Expr -- we're keeping our assignment statement
          ...
          | Block Decl Stmt    -- a block is a bunch of variable declarations followed by statements
          deriving Show

data Decl = NewVar String Decl -- a declaration list can be
          | Empty              -- empty
          deriving Show

data Expr = ...                -- expressions remain the same...
          | Var x              -- except we will change the meaning of variable lookup
          | ...               

data Value = Num Integer       -- value are the same
           deriving Show

type Out = [Value]


-- There are multiple ways we can represent allocation
-- The questions to answer are:
--   a) how do we select the next address?
--   b) what do we store at the freshly allocated address?
-- Here we use the following:
--   a) We take the highest address plus one
--   b) We store the number 0. Note that this choice can have various consequences: Do we need to distinguish between initialized and unitialized storage cells?

alloc :: Store -> (Address, Store)
alloc [] = (first, add first (Num 0) empty)
  where first = 0
alloc m = (new, add new (Num 0) m)
  where new = (maximum (keys m)) + 1
        keys [] = [] -- get all the keys from a map
        keys ((k, _) : m) = k : keys m

-- process declarations
procDecl :: Store -> Env -> Decl -> (Store, Env)
procDecl sto env (NewVar x d) = 
  procDecl sto' env' d 
  where (addr, sto') = alloc sto  -- allocate a new address in the store
        env' = add x addr env     -- bind the requested name to the new address
procDecl sto env Empty = (sto, env)

evalExpr :: Env -> Store -> Expr -> Maybe Value
...
evalExpr env sto (Var x) = -- NEW 
  do addr <- get x env     -- first we find the address of the variable
     get addr sto          -- then we find the value in the store
...

execStmt :: Env -> (Store, Stmt) -> Maybe (Store, Out)
execStmt env (sto, Assign x e) =  -- NEW
  case evalExpr env sto e of      -- first evaluate the expression to assign
       Just v -> 
         case get x env of        -- find the address of the variable
              Just addr -> Just (add addr v sto, []) -- "overwrite" the address in the store
              _ -> Nothing
       Nothing -> Nothing
execStmt env (sto, Seq c1 c2) = ... -- same as before
execStmt env (sto, Print e) = ... -- same as before
execStmt env (sto, Block d s) = -- NEW: the structure of a block is similar to let
  let (sto', env') = procDecl sto env d -- process declarations (allocating variables in the store)
  in execStmt env' (sto', s)            -- execute statements in the store

Formal Operational Semantics

Types and Type Systems

This part is partially based on notes by Norman Ramsey and on the book Types and Programming Languages by Benjamin Pierce

What is a type?

In a Haskell-like language (simplified by ignoring type classes)

1 + 5 :: Integer

"hello " ++ "world" :: String

n > 10 :: Bool -- if n :: Integer

if n > 10 then "Yes" else "No" :: String  -- if n :: Integer

\x -> x + 10 :: Integer -> Integer

Types classify program phrases according to the kinds of values they compute
They are predictions about values
Static approximation of runtime behavior – conservative

Why types?

Static analysis: detect (potential) runtime errors before code is run:
```
1 + True

10 "hello" -- application of the number 10 to a string?
```
- E.g., Python is happy to accept the following function definition:
```
def f(x):
  if x < 10:
    x(x, 10)
  else: 
    "Hello " + x
```
  What happens at runtime, when the function is called as f(4)?
Enforcing access policy (private/protected/public methods)
Guiding implementation (type-based programming)
Documentation: types tell us a lot about functions and provide a documentation that is repeatedly checked by the compiler (unlike comments)
Help compilers choose (more/most) efficient runtime value representations and operations
Maintenance: if we change a function’s type, the type checker will direct us to all use sites that need adjusting

What is a type system?

A tractable syntactic method for proving the absence of certain program behavior.

They are studied on their own as a branch of mathematics/logic: type theory
Original motivation: avoiding Russell’s paradox
Type systems are generally conservative:
```
1 + (if True then 10 else "hello")
```
would behave OK at runtime, but is, nevertheless, typically rejected by a static type checker (e.g., Haskell’s)

What kind of errors are typically not detected by type systems?

Division by zero
Selecting the head of an empty list
Out-of-bounds array access
Non-termination

Consideration: a program which mostly runs numeric computations will benefit from a strong static type system less than a program which transforms various data structures

Terminology: type systems

Dynamic: types are checked at runtime, typically when operations are performed on values
Static: types are checked before program is (compiled and) run

What is a type safe language?

Can a dynamically typed language be safe?
Is a statically typed language automatically type safe?

Dynamic:

Consider Python

>>> 1 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

>>> "hello" + 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str

Rejects applying + to incompatible arguments – protects from behavior that is incompatible with the abstractions of integers and strings

Static:

Consider C:
```
int array[] = {1, 2, 3, 4};
char string[] = "Hello, world!";

printf("%d %d %d %d\n", array[0], array[1], array[2], array[3]);

array + 2;
string + 3;
```
Here the compiler will merely complain about unused values on lines 6 and 7. But what does the expression array + 2 mean? Of course, a C programmer knows, that arrays are just pointers, and so adding two to a pointer merely shifts where the pointer points to. But is it complatible with the abstraction of array?

So, again: what is a type safe-language?

A (type-)safe language is one that protects its own abstractions.

This means that a safe language won’t allow a programmer to apply operations which are not “sensible” for the given arguments, potentially leading to unexpected/undefined behavior at runtime.

Specifying Type Systems

Abstract syntax + Types + Typing rules (+ auxiliary operations)

A type system is typically specified as a set of rules which allow assigning types to abstract syntax trees – similarly to how evaluators assign values to abstract syntax trees
The goal: determine what type an expression (program phrase) has – if it has one
A type judgement – mathematically:

\[ \vdash e : t \]

Read: “\(e\) has type \(t\)”

A “type judgement” – in Haskell
```
typeOf e = t
```
Typing rules tell us how to arrive at the above conclusion for a particular expression
This is based on syntax – in other words type checking (and typing rules) are, typically, syntax-directed
On paper, typing rules are usually expressed as inference rules:

\[ \frac{\text{1st premise} \quad \text{2nd premise} \quad \text{3rd premise ...}}{\text{conclusion}} \]

Such a rule can be read as “If 1st premise AND 2nd premise AND 3rd premise … are all true, then the conclusion is also true”
If we can show that the premises hold, the rule allows us to conclude that is below the line
If a rule has no premises, it is an axiom
Here are some examples, written mathematically

\[ \frac{}{\vdash 3 : \text{Integer}} \]

“The type of the value 3 is Integer”

\[ \frac{n \text{ is an integer value}}{\vdash n : \text{Integer}} \]

“If \(n\) is an integer value, then the type of \(n\) is Integer”

\[ \frac{\vdash e_1 : \text{Integer} \quad \vdash e_2 : \text{Integer}}{\vdash e_1 + e_2 : \text{Integer}} \]

“If the type of \(e_1\) is Integer and the type of \(e_2\) is Integer, then the type of expression \(e_1 + e_2\) is also Integer”

We can (and will) view these inference rules as a fancy way of writing Haskell functions
Let us first define datatypes for expressions (for now only integer numbers and addition) and types (only integers)
```
data Expr = Num Integer
          | Add Expr Expr

data Type = TyInt
```

The above two rules as a Haskell type checker:

typeOf :: Expr -> Maybe Type   -- an expression might not have a type
typeOf (Num n) = return TyInt
typeOf (Add e1 e2) =
  do TyInt <- typeOf e1
     TyInt <- typeOf e2
     return TyInt

Note that the return type of typeOf is Maybe Type
- This is to allow for the possibility that an expression might not have a type (although in this trivial language, all expressions do)

We use the do notation (together with return) to simplify the definition. The above is equivalent to:

typeOf :: Expr -> Maybe Type   -- an expression might not have a type
typeOf (Num n) = Just TyInt
typeOf (Add e1 e2) =
  case typeOf e1 of
       Just TyInt -> case typeOf e2 of
                          Just TyInt -> Just TyInt
                          _ -> Nothing
       _ -> Nothing

To make the connection a little more explicit, we will write inference rules as a mix of Haskell and math:

------------------
 |- Num n : TyInt


 |- e1 : TyInt    |- e2 : TyInt
--------------------------------
 |- Add e1 e2 : TyInt

More typing rules (we add a few new expression shapes + a new type for booleans):

data Expr = ...
          | Bool Bool
          | And Expr Expr
          | Not Expr
          | Leq Expr
          | If Expr Expr Expr

data Type = ...
          | TyBool


--------------------
 |- Bool b : TyBool


 |- e1 : TyBool   |- e2 : TyBool
---------------------------------
 |- And e1 e2 : TyBool


 |- e : TyBool
-------------------
 |- Not e : TyBool


 |- e1 : TyInt   |- e2 : TyInt
-------------------------------
 |- Leq e1 e2 : TyBool


 |- e1 : TyBool   |- e2 : t   |- e3 : t
----------------------------------------
 |- If e1 e2 e3 : t

How do we apply these rules?

We build derivations!
But what are derivations?
A derivation is a (proof) tree built by consistently replacing variables in inference rules by concrete terms
At the bottom of the tree is the typing judgment we are trying to show

Examples:

A numeric literal
```
------------------
 |- Num 3 : TyInt
```
Nothing else needed here, since the rule is an axiom and doesn’t have any conditions (premises)

Addition of two numbers

 |- Num 3 : TyInt   |- Num 3 : TyInt
-------------------------------------
 |- Add (Num 3) (Num 4) : TyInt

Boolean expression:
```
 |- Bool True : TyBool           |- Bool False : TyBool  |- Bool True : TyBool
-----------------------------   -----------------------------------------------
 |- Not (Bool True) : TyBool     |- And (Bool False) (Bool True) : TyBool
-------------------------------------------------------------------------------
 |- And (Not (Bool True)) (And (Bool False) (Bool True)) : TyBool
```
Prettier:

\[ \cfrac{\cfrac{\vdash \text{Bool True} : \text{TyBool}} {\vdash \text{Not (Bool True)} : \text{TyBool}} \quad \cfrac{\vdash \text{Bool False} : \text{TyBool} \quad \vdash \text{Bool True} : \text{TyBool}} {\vdash \text{And (Bool False) (Bool True)} : \text{TyBool}} } {\vdash \text{And (Not (Bool True )) (And (Bool False) (Bool True))} : \text{TyBool}} \]

Conditional involving booleans and integers

 |- Num 3 : TyInt   Num 4 : TyInt
----------------------------------
 |- Leq (Num 3) (Num 4) : TyBool
--------------------------------------
 |- Not (Leq (Num 3) (Num4)) : TyBool    |- Num 3 : TyInt   |- Num 5 : TyInt
-----------------------------------------------------------------------------
 |- If (Not (Leq (Num 3) (Num 4))) (Num 3) (Num 5)

A failing one:
```
 |- Bool True : TyBool  |- Num 3 : TyInt
-----------------------------------------
 |- Add (Bool True) (Num 3) : ?
```
We have no rule to apply here. We would need Num 3 to have type TyBool and there is no rule that allows us to derive this. Hence, the above expression cannot be type-checked.

Type-checking Involving Variables

Syntax extensions:

data Expr = ...
          | Var Variable
          | Let Variable Expr Expr

How do we deal with variables?

We need to keep track of types assigned to variables
Idea: Like for an (environment-based) evaluator for expressions, use an environment
The environment maps variables to types

type TyEnv = Map Variable Type

Example rules:

  t <- get x tenv        
------------------- 
 tenv |- Var x : t  



 tenv |- e1 : t1    add x t1 env |- e2 : t2
--------------------------------------------
         tenv |- Let x e1 e2 : t2

In Haskell:

typeOf :: TyEnv -> Expr -> Maybe Type
...
typeOf tenv (Add e1 e2) =     -- previous cases need to be refactored to use tenv
  do TyInt <- typeOf tenv e1
     TyInt <- typeOf tenv e2
     return TyInt
...
typeOf tenv (Var x) = get tenv x      -- NEW: variable lookup
typeOf tenv (Let x e1 e2) =           -- NEW: let-binding
  do t1 <- typeOf tenv e1             -- get type of e1
     t2 <- typeOf (add x t1 tenv) e2  -- get the type of e2, assuming x : t1
     return t2

Simply-Typed Lambda Calculus (STLC)

Syntax extensions:

Note that abstractions need to specify the type of the bound variable – there is no way for the type-checker to guess it (at this stage)

data Expr = ...
          | Lam Variable Type Expr
          | App Expr Expr

data Type = ...
          | TyArrow Type Type

TyArrow:

The new type constructor, TyArrow, represents a function type:

TyArrow TyInt TyBool is the type a function that takes an integer (TyInt) and returns a boolean (TyBool). In Haskell (also in some other languages and in type theory), this is written Integer -> Bool

TyArrow (TyArrow TyInt TyBool) (TyArrow TyInt TyBool) corresponds to (Integer -> Bool) -> (Integer -> Bool), that is, the type of a function that takes a function from integers to booleans and returns a function from integers to booleans.

Due to currying, we normally understand this as a function that takes a function from integers to booleans, then an integer and returns a boolean. Note that this also means that the arrow -> is right-associative and the above Haskell type can be equivalently written as (Integer -> Bool) -> Integer -> Bool. Also note, that this is opposite of how application associates, which is to the left.

Note on associativity:

Function type – RIGHT: t1 -> t2 -> t3 -> t4 is the same as t1 -> (t2 -> t3 -> t3) is the same as t1 -> (t2 -> (t3 -> t4))

Function application – LEFT: f a b c is the same as (f a) b c is the same as ((f a) b) c

Rules

 add x t1 tenv |- e : t2        
------------------------------------
 tenv |- Lam x t1 e : TyArrow t1 t2


 tenv |- e1 : TyArrow t2 t1    e2 : t2'   t2 == t2' 
----------------------------------------------------
             tenv |- App e1 e2 : t1

The fixpoint operator:

No fixpoint combinator (e.g., Y or Z) can be type-checked in STLC, so it has to be added as a primitive operation

data Expr = ...
          | Fix Expr

 tenv |- e : TyArrow t t'   t == t'
------------------------------------
          tenv |- Fix e : t

Polymorphism

Limitations of monomorphic type systems:

What we’ve seen so far are monomorphic types, meaning a type only represents one type
What is the problem?
- identity
- operations on lists
- operations on pairs
- …

Consider length of a list – we need:

length_int :: [Integer] -> Integer
length_bool :: [Bool] -> Integer

Are we done?

What about length of lists of functions from integers to integers?

Length of lists of functions from integers to booleans?

Length of lists of functions from (function from integers to booleans) to integers?

Etc.

Even simpler: identity

id_int :: Integer -> Integer
id_bool :: Bool -> Bool
id_intToBool :: (Integer -> Bool) -> (Integer -> Bool)

Functions which perform operations such as counting the number of elements of a list, or swapping the elements of a pair, do not depend on the type of the elements in the list or pair

What do we need?

Back to an untyped language? :-(
We would really like to specify a type of functions that work for lists (or pairs) containing any type
In other words, we need a way to say “for any type t, list of t”

In yet other words,

length :: forall a. [a] -> Integer
id :: forall a. a -> a

Ingredients:
1. Type variables
2. Abstracting type variables (quantifying)
In Haskell (or ML, OCaml, …), polymorphic types are inferred for you and you (usually) do not need to say that you want a polymorphic function
Another implementation of the same idea are Java generics

Basics:

Additional syntax for types

type TyVariable = String

data Type = ...
          | TyVar TyVariable
          | TyForall TyVariable Type

For TyForAll, we can use a more economical alternative:

data Type = ...
          | TyVar TyVariable
          | TyForall [TyVariable] Type

We are now able to abstract the type of a function. But how do we actually give the information to the type-abstracted function?

Idea: we pass what type we actually intend to use at runtime.

Consequence: We need type abstraction and type application on the expression level

New syntax:

data Expr = ...
          | TAbs TyVariable Expr
          | TApp Expr Type

We obtain: polymorphic lambda calculus (with extensions) aka System F.

How do we perform application? Substitute the types

Typing rules:

 tenv |- e : t
---------------------------------
 tenv |- TAbs a e : TyForall a t

 tenv |- e : TyForall a t'
----------------------------------
 tenv |- TApp e t : tsubst a t t'

Here we use type substitution to substitute types (careful about Forall!)

typeOf tenv (TAbs a e) =
  do t <- typeOf tenv e
     return (TyForall a t)
typeOf tenv (TApp e t) = 
  do TyForall a t' <- typeOf tenv e
     tsubst a t t'
typeOf tenv (Add e1 e2) =     -- previous cases need to be refactored to use tenv
  do TyInt <- typeOf tenv e1
     TyInt <- typeOf tenv e2
     return TyInt
typeOf tenv (Num _) = return TyInt

tsubst :: TyVariable -> Type -> Type -> Maybe Type
tsubst a s (TyVar b) | a == b    = return s
                     | otherwise = return (TyVar b)
tsubst a s (TyForall b t)
  | a == b = return $ TyForall b t
  | freeInType a s = Nothing
  | otherwise = TyForall b <$> tsubst a s t
tsubst a s (TyArrow t1 t2) = 
  do t1' <- tsubst a s t1
     t2' <- tsubst a s t2
     return (TyArrow t1' t2')

How do we evaluate type abstraction and type application? We can either use substitution or add another environment

Other kinds of polymorphism

What we talked about is parametric polymorphism – mention let polymorphism
Other type of polymorphism: ad-hoc
- Allows a polymorphic value to exhibit different behaviors, depending on the actual type
- Overloading: associates a single function symbol with many implementations
- Compiler (or the runtime system) chooses an appropriate implementation for each application of the function – based on the types of the arguments

Later, we will add extensions that make many things simpler and also allow us to build realistic programming languages.↩︎