# Programming Languages

Lecture Notes for CS4400/5400

# Lambda Calculus

Lambda calculus is a theory of functions. What is a function? There are two basic views one can take when characterizing them:

1. Function as a graph
2. Function as a value

Considering a function $$f$$ as a graph is to consider it as a set of pairs – mappings between input and output values $$(x, f(x))$$. For example the square function on natural numbers $$^2 : \mathbb{N} \to \mathbb{N}$$ can be characterized as a set of pairs $$(n, n^2)$$:

$\{ (0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), ... \}$

Using a function as a graph is to find an output that corresponds to our input. The alternative view to take is to consider a function as rules – equations, which tell us how to compute the output of the function from its input. For example, the square function $$^2 : \mathbb{N} \to \mathbb{N}$$ is defined by the equation:

$n^2 = n \times n$

How do we use this function? We substitute an expression that looks like the left-hand side with the right-hand side, replacing the argument $$n$$ with the expression and then computing the resulting expression. For example, our calculation might proceed as follows:

\begin{aligned} {4^2}^2 + 3^2 &= (4 \times 4)^2 + 3^2\\ &= \left((4 \times 4) \times (4 \times 4)\right) + 3^2\\ &= \left((4 \times 4) \times (4 \times 4)\right) + (3 \times 3)\\ % &= \left(16 \times (4 \times 4)\right) + (3 \times 3)\\ % &= \left(16 \times (4 \times 4)\right) + 9\\ % &= \left(16 \times 16\right) + 9\\ % &= 256 + 9\\ &... \\ &= 265 \end{aligned}

Or, as follows: \begin{aligned} {4^2}^2 + 3^2 &= (4 \times 4)^2 + 3^2\\ &= 16^2 + 3^2\\ &= 16^2 + 9\\ &= 256 + 9\\ &... \\ &= 265 \end{aligned}

In any case, the important thing to note is that we replace any occurrence of $$n^2$$ for any $$n$$ using the defining equation. In general, if we define a function $$f$$ by the equation $$f(x) = E$$, where $$E$$ is some mathematical expression (potentially containing $$x$$), then we use (apply) this function by replacing any occurrence of $$f(D)$$ (where $$D$$ is a mathematical expression) by $$E[x := D]$$, that is the expression $$E$$ where all occurrences of $$x$$ are replaced by $$D$$. This is called substitution of a variable $$x$$ in an expression $$E$$ for another expression $$D$$. E.g., if

$f(x) = x + x$

then:

\begin{aligned} f(20) + f(2 \times 3) &= (x + x)[x := 20] + (x + x)[x := 2 \times 3] \\ &= (20 + 20) + ((2 \times 3) + (2 \times 3)) \\ ... \end{aligned} The next question is, how important is the name of the function? We use names as mnemonics, so that we can say we can

1. say “let $$f$$ be the function defined by the equation $$f(x) = E$$” (where $$E$$ is an arbitrary mathematical expression), and
2. replace any occurrence of $$f$$ applied to an argument with an instance of $$E$$ where $$x$$ is replaced with the argument expression.

We can do this without inventing names, by using functions as anonymous objects – just like we easily use numbers or strings or arrays. In mathematics an anonymous function will be written as $$x \mapsto E$$. For example, the square function is $$x \mapsto x \times x$$, the above function $$f$$ is $$x \mapsto x + x$$.

The above exposition applies to programming too. Basically, all sensible “higher-level” programming languages allow us to define functions to abstract a computation by replacing a concrete expression with a variable – a placeholder. In Python we might write:

def square(x):
return x * x

In C/C++:

int square(int x) {
return x * x;
}

square :: Integer -> Integer
square x = x * x

In Scheme:

(define (square x)
(* x x))

In any programming language we operate with the rough understanding that whenever square is invoked with an argument, that application might as well be replaced with the body of the function with the argument variable replaced with the actual argument (either before or after evaluating the argument itself). More and more programming languages, particularly those which allow passing functions as arguments, allow creating functions without naming them – so called anonymous functions. Python and Scheme have lambda:

lambda x : x * x
(lambda (x) (* x x))

OCaml has fun or function:

fun x => x * x

Haskell has \:

\x -> x * x

C++ has, well…

[](int x){ return x * x; }

As hinted by the Scheme and Python examples, Lambda calculus is the underlying theory behind these anonymous functions. In its pure form, it is exclusively concerned with what it means to apply an abstracted expression (as an anonymous function), to an argument. It studies this as a purely syntactic operation.

Where Python and Scheme have lambda, OCaml has fun and function, Lambda calculus has $$\lambda$$. That is an anonymous function with the formal parameter $$x$$ is constructed using $$\lambda x...$$ We can write the squaring function in lambda notation as

$\lambda x.\ x \times x$

We say that this is a lambda abstraction that binds the variable $$x$$ in $$x \times x$$. In other words, $$x$$ is bound in $$x \times x$$. An application is written (similarly to Scheme, OCaml, or Haskell) by writing the function and argument next to each other (juxtaposition). For example, where in Scheme we could write

((lambda (x) (* x x)) 10)

(\x -> x * x) 10

In lambda notation we write:

$(\lambda x.\ x \times x)\ 10$

As I mentioned before, Lambda calculus looks at the application of a function as a syntactic operation, in terms of substitution, as the process of replacing any occurrence of the abstracted variable with the actual argument. For the above, this is replacing any occurrence of $$x$$ in $$x \times x$$ with $$10$$:

\begin{aligned} (\lambda x.\ x \times x)\ 10 &= (x \times x)[x := 10]\\ &= 10 \times 10 \end{aligned}

Another way of thinking about the bound variable $$x$$ in the $$\lambda x.\ x \times x$$ as a placeholder or hole, where the argument “fits”.

$(\lambda \boxed{\phantom{x}}.\ \boxed{\phantom{x}} \times \boxed{\phantom{x}})\ 10 = \boxed{10} \times \boxed{10}$

## Pure Lambda Calculus

Here, we will look at the formal theory of pure Lambda Calculus. We will look at the syntax and a notion of computation.

### Syntax

The basic syntax of the calculus is really simple:

  <Lambda> ::= <Variable>
| ( <Lambda> <Lambda> )
| ( λ <Variable> . <Lambda> )

That is all there really is:1

• variable reference, e.g. $$x$$, $$y$$, $$z$$, $$a$$, $$\mathit{square}$$
• application, e.g., $$(x\ y)$$, $$((\lambda x.\ x)\ (\lambda x.\ x))$$
• lambda abstraction, e.g.,
• $$(\lambda x.\ x)$$ – expressing the identity function
• $$(\lambda x.\ x\ x)$$ – a function that applies its argument to itself

You might ask: what can we do with such a minuscule language? Turns out a lot. As proven by A.M. Turing, this pure version of Lambda calculus is equivalent in computational power to Turing Machines. That means we are able to build up a programming language out of these three constructs. We will look at how to do that in the section on Programming in Pure Lambda Calculus below.

#### Syntax Conventions and Terminology

Terminology: Take a lambda abstraction:

$(\lambda x.\ N)$

1. $$\lambda x$$ is a binder binding $$x$$
2. $$N$$ is the body of the abstraction

To avoid writing too many parentheses, these conventions are usually taken for granted:

1. Outermost parentheses are usually dropped: x x, λx. x.
2. Application associates to the left. That is, (((a b) c) d) is the same as ((a b c) d) is the same as (a b c d), which is the same as a b c d (see previous rule).
3. Lambda abstraction bodies extend as far to the right as possible. That is, (λa. (λb. ((a b) c))) is the same as λa. λb a b c.

### Beta Reduction

Computation in pure lambda calculus is expressed in a single rule: the $$\beta$$-reduction rule:

$(\lambda x.\ M)\ N \longrightarrow_\beta M[x := N]$

The long arrow stands for “reduces to”. On the left-hand side, we have an application of a lambda abstraction to an arbitrary term. On the right-hand side, we substitute the abstraction’s bound variable with the argument. A term that matches the pattern on the left-hand side (that is, a lambda abstraction applied to something) is called a redex, short for reducible expression. For example:

• (λx. x) a -->β a
• (λx. x x) (λy. y) -->β (λy. y) (λy. y)
• the above reduces further: (λy. y) (λy. y) -->β (λy. y)
• not a redex: (x x)
• also not a redex: x (λy. y)
• also not a redex: (λy. (λx. x) y), although it does contain a redex ((λx. x) y)

### Variables: Bound, Free. Closed Expressions

We have already mentioned the notion of a bound variable. A variable is said to be bound in an expression, if it appears under a λ-abstraction binding that particular variable. Or, in other words, it is bound if it appears in the scope of a binder. For example:

• x is bound in (λx. x x) – it appears in the scope of the binder λx
• both x and y are bound in (λx. λy. x y)x appears in the scope of λx, y in the scope of λy
• x is not bound in (λy. x y), but y is – x does not appear in the scope of any binder here, while y appears in the scope of λy

A free variable is one which appears in a position where it is not bound. For example:

• x is free in x x, in λy. x y, or in (λy. y y) x
• x is not free in (λx. x x) (λx. x)
• x is both bound and free in (λx. x y) x, while y is only free

As you can see above, a variable might be both bound and free in an a expression.

An expression which contains no free variables is closed, for example:

• λx. x
• λx. λy. x y x

A closed lambda expression is also called a combinator.

A variable is called fresh for an expression, if it does not appear free in that expression. For example, x is fresh for y z or (λx. x x).

### Names of Bound Variables Don’t Matter

Intuitively, an identity function should be an identity function, no matter what we choose to name its bound variable. That is, (λx. x) should be considered the same as (λy. y) or (λz. z). This is captured in the notion of alpha equivalence: two expressions are α-equivalent, if they only differ in the names of their bound variables. This also means, that we are free to α-convert any lambda term by consistently renaming bound variables. However, the new names must differ from free variables under the particular binder. We are thus free to convert (λx. x) to, e.g., (λa. a); (λy. z y) to (λx. z x), but not to (λz. z z).

### Substitution

We were happy to use substitution in an informal manner up until now:

$$M[x := N]$$ means replacing occurrences of the variable $$x$$ in the expression $$M$$ with the expression $$N$$.

Here we want to pin it down. For that, we will need to consider the difference between bound and free variables. Let’s try to start with a naive definition of substitution.

#### Naive Substitution

There are three syntactic forms, we need to consider each form:

Variable: x[y := N] = ?

Application: (M1 M2)[y := N] = ?

Abstraction: (λx. M)[y := N] = ?

Variables are straightforward: we either find the variable to be substituted or we find a different one:

1. y[y := N] = N

2. x[y := N] = x if x $$\neq$$ y

Application is also relatively simple – we simply substitute in both left-hand and right-hand side:

1. (M1 M2)[y := N] = (M1[y := N] M2[y := N])

Now, for a lambda abstraction we need to consider the variables involved. We certainly don’t want to override the bound variable of a function:

1. (λy. M)[y := N] = (λy. M)

The remaining case seems simple enough too:

1. (λx. M)[y := N] = (λx. M[y := N]) if x $$\neq$$ y

If we test this substitution everything seems to be okay:

  (x x)[x := (λy. y)] = (x[x := (λy. y)] x[x := (λy. y)])
= (λy. y) (λy. y)

((λx. x y) x)[x := (λy. y)] = (λx. x y)[x := (λy. y)] x[x := (λy. y)]
= (λx. x y) (λy. y)

However, what happens if the expression that we are substituting contains the bound variable?

  (λy. x)[x := y] = λy. y

We see that in this case, we have just “captured” variable y and changed its status from free to bound. This changes the meaning of a variable – whereas the original meaning of y was given by the context of the left-hand side expression, now it is given by the binder λy. In particular, we changed a constant function—which, after the substitution should return a free y, no matter what argument it is applied to—to an identity function, that just returns whatever its argument is.

From this we see that we need substitution to behave differently when there the expression that we are trying to substitute, contains free variables that clash with variables bound by a lambda-abstraction.

#### Safe Substitution

To fix this we can restrict when the last case of our substitution applies:

1. (λx. M)[y := N] = (λx. M[y := N]) if x $$\neq$$ y and if x is not free in N

Now our substitution is “safe”. However, this turns it into a partial function – it is left undefined for cases where the bound variable x appears free in N. To go around this, we can make use of alpha-conversion: we consistently rename the bound variable x to one that doesn’t clash with free variables in N or M.

1. (λx. M)[y := N] = (λx. M[x := x'][y := N]) if x $$\neq$$ y and x' is fresh for y, N and M

Now substitution is a total function again. For an implementation, we just need to choose how to pick a fresh variable.

### Reduction Strategies

Beta reduction tells us how to reduce a redex. The missing piece of the puzzle is how to decide where to look for a redex and apply the beta-reduction rule. This is given by reduction strategies.

(The following text is taken, with minor modifications, from Types and Programming Languages)

#### Full Beta-Reduction

Under this strategy, any redex may be reduced at any time. At each step we pick some redex, anywhere inside the term we are evaluating, and reduce it. For example, consider the term:

(λa. a) ((λb. b) (λz. (λc. c) z))

This term contains three redexes:

• (λa. a) ((λb. b) (λz. (λc. c) z))
• (λb. b) (λz. (λc. c) z)
• (λc. c) z

Under full beta-reduction, we might choose, for example, to begin with the innermost redex, then do the one in the middle, then the outermost:

U+10FC74

(λa. a) ((λb. b) (λz. (λc. c) z))
--> (λa. a) ((λb. b) (λz. z))
--> (λa. a) (λz. z)
--> λz. z

λz. z cannot be reduced any further and is a normal form.

Note, that under full beta-reduction, each reduction step can have more than possible one result, depending on which redex is chosen.

#### Normal Order

Under normal order, the leftmost, outermost redex is always reduced first. Our example would be reduced as follows:

(λa. a) ((λb. b) (λz. (λc. c) z))
--> (λb. b) (λz. (λc. c) z)
--> λz. (λc. c) z
--> λz. z

Again, λz. z is the normal form and cannot be reduced further.

Because each redex is chosen in a deterministic manner, each reduction step has one possible result – reduction thus becomes a (partial) function.

#### Call by Name

Call by name puts more restrictions on which redexes are fair game, and disallows reductions inside abstractions. For our example, we perform the same reduction steps as normal form, but stop short of “going under” the last abstraction.

(λa. a) ((λb. b) (λz. (λc. c) z))
--> (λb. b) (λz. (λc. c) z)
--> λz. (λc. c) z

Haskell uses an optimization of call by name, called call by need or lazy evaluation. Under call by name based strategies, arguments are only evaluated if they are needed.

#### Call by Value

Under call by value, only outermost redexes are reduced and each redex is only reduce after its right-hand side has been fully reduced to a normal form.

(λa. a) ((λb. b) (λz. (λc. c) z))
--> (λa. a) (λz. (λc. c) z)
--> λz. (λc. c) z

Evaluation strategies based on call by value are used by the majority of languages: an argument expression is evaluated to a value before it is passed into the function as an argument. Such a strategy is also called strict, because it strictly evaluates all arguments, regardless of whether they are used.

## Programming in Pure Lambda Calculus

### Multiple Arguments

So far, we have looked at lambda abstractions which only take a single argument. However, unary functions are only a small part of our experience with programming. We use functions with multiple arguments all the time. How do we pass more than one argument to a lambda?

One approach would be to extend the calculus with a notion of tuples. Perhaps throw in some pattern matching, for good measure:

$(\lambda (x, y).\ x\ y)\ (a, b)$

However, this means that we are abandoning the very minimal core lambda calculus with all its simplicity. And we don’t have to! As we know well by now, applying an abstraction simply replaces its bound variable with the argument that it’s applied to, as in this trivial example:

$(\lambda x. x\ y)\ b \longrightarrow (x\ y)[x := b] = (b\ y)$

What happens if the abstraction actually just returns another abstraction.

\begin{aligned} (\lambda x.\ (\lambda y.\ x\ y))\ b \longrightarrow (\lambda y.\ x\ y)[x := b] = (\lambda y.\ b\ y) \end{aligned}

Since neither of the bound variable of the inner abstraction ($$y$$) and the variable we are substituting for ($$x$$), nor the bound variable of the inner abstraction ($$y$$) and the term we are substituting ($$b$$) are in conflict, we simply substitute $$x$$ for $$b$$ inside the inner abstraction. This yields an abstraction which can be applied to another argument. That is applying $$(\lambda x.\ (\lambda y.\ x\ y))$$ to $$b$$ returned an abstraction which is “hungry” for another argument. We can now apply that abstraction to another argument:

$(\lambda y.\ b\ y)\ a \longrightarrow (b\ y)[y := a] = b\ a$

Let’s do the same in one expression:

\begin{aligned} (((\lambda x.\ (\lambda y.\ x\ y))\ b)\ a &\longrightarrow ((\lambda y.\ x\ y)[x := b])\ a \\ &= (\lambda y.\ b\ y)\ a \\ &\longrightarrow (b\ y)[y := a]\\ &= (b\ a) \end{aligned}

We just applied an abstraction to two arguments. To make this a little easier to see, we can use left-associativity of application and the fact that the scope of a binder goes as far right as possible to rewrite the original expression as

$(\lambda x.\ \lambda y.\ x\ y)\ b\ a$

This technique is called currying (after Haskell Curry, although he was not the first one to come up with it). It is so common that, usually a short-hand is introduced for abstractions with more than one argument:

\begin{aligned} (\lambda x\ y.\ ...) &\equiv (\lambda x.\ \lambda y.\ ...)\\ (\lambda x\ y\ z.\ ...) &\equiv (\lambda x.\ \lambda y.\ \lambda z.\ ...)\\ \text{etc.} \end{aligned}

If we allow arithmetic in our lambda expressions a nice example will be:

\begin{aligned} \left(\lambda x\ y.\ \frac{x + y}{y}\right) 4\ 2 &\longrightarrow \left(\lambda y.\ \frac{4 + y}{y}\right) 2 \\ &\longrightarrow \frac{4 + 2}{2} \end{aligned}

Currying is used as the default for functions of multiple arguments by Haskell and OCaml (determined mostly by their standard libraries). On the other hand, Standard ML’s library uses tuples as default.

### Data types

We see that we can represent functions with multiple arguments in PLC. Surely, for representing other kinds of data (such as booleans, numbers, data structures), we need to introduce extensions and add these as primitive operations? Not really…

#### Booleans

Many types of values can be represented using Church encodings. Booleans are probably the simplest and most straightforward:

\begin{aligned} \mathsf{true} &= \lambda t\ f.\ t &\qquad&(= \lambda t.\ \lambda f.\ t)\\ \mathsf{false} &= \lambda t\ f.\ f & &(= \lambda t.\ \lambda f.\ f)\\ \end{aligned}

What do these mean? The representation of $$\mathsf{true}$$ is a function that takes two arguments and returns the first one. On the other hand, $$\mathsf{false}$$ returns its second argument. To make sense of these, we need to put them to work and see how they work with boolean operations.

We start with the conditional: $$\textsf{if-else}$$. It should take three arguments and return its second one if the first one evaluates to $$\textsf{true}$$, and its third argument otherwise. That is we are looking for an expression:

$\textsf{if-then}\ \textsf{true}\ x\ y \longrightarrow ... \longrightarrow x$

and

$\textsf{if-then}\ \textsf{false}\ x\ y \longrightarrow ... \longrightarrow y$

Notice something?

\begin{aligned} \textsf{true}\ x\ y &\longrightarrow x\\ \textsf{false}\ x\ y &\longrightarrow y \end{aligned}

That means that all $$\textsf{if-then}$$ needs to do is to apply its first argument to its second and third argument, since the boolean representation takes care of the selection itself:

$\textsf{if-then} = \lambda b\ t\ f.\ b\ t\ f$

Let’s try to look at conjunction: and. We look for ??? to put in:

(λa b. ???) true true   --> ... --> true
(λa b. ???) true false  --> ... --> false
(λa b. ???) false true  --> ... --> false
(λa b. ???) false false --> ... --> false

First note that true true x --> true for any $$x$$, so it seems that λa b. a b x could work if we find an appropriate $$x$$:

(λa b. a b x) true true --> (λb. true b x) true --> true true x --> ... --> true

Now note that in all but the first case and should reduce to false. In the second case,

(λa b. a b x) true false --> ... --> true false x --> ... --> false

for any $$x$$, so that still works. Now, how can we get false true x --> false? By taking $$x$$ to be false:

(λa b. a b false) false true --> ... --> false true false --> ... --> false

The final case also works:

(λa b. a b false) false false --> ... --> false false false --> ... --> false

Hence $\textsf{and} = \lambda a\ b.\ a\ b\ \textsf{false}$

Another way of thinking about the definition of and is to define it terms of if-then-else. E.g., in Haskell,

and :: Bool -> Bool -> Bool
and a b = if a then b else False

which just says that if the first argument is true then the result of and depends on the second one, and if its false the result will be false regardless of the second argument.

Based on this, we can express the and operation using if-else, which we defined above, and show that it is equivalent to the previous definition by simplifying it using normal order reduction:

and =   λa b. if-else a b false
=   λa b. (λb t f. b t f) a b false
--> λa b. (λt f. a t f) b false
--> λa b. (λt. a b) false
--> λa b. a b false

Can you come up with a representation of $$\textsf{or}$$? $$\textsf{not}$$?

#### Natural Numbers: Church Numerals

Natural numbers are Church-encoded as Church numerals:

zero = λs z. z
one = λs z. s z
two = λs z. s (s z)
three = λs z. s (s (s z))
...

A numeral for $$n$$ can be understood as a function that takes some representation of a successor function and some representation of zero and applies the successor to zero $$n$$ times.

How about operations on numerals? The successor of a numeral λs z... is computed by inserting one more application of s inside of the abstraction:

succ (λs z. z) --> ... --> λs z. s z
succ (λs z. s z) --> ... --> λs z. s (s z)
succ (λs z. s (s z)) --> ... --> λs z. s (s (s z))
...

We know that succ takes a numeral (which is an abstraction) and returns another numeral, which is again an abstraction:

succ = λn. (λs z. ...n...)

Taking λs z. z as an example input:

(λn. (λs z. ...n...)) (λs z. z)
--> (λs z. ...(λs z. z)...))
--> (λs z. s z)

We see that we need to apply an extra s under λs z.:

(λs z. s ...(λs z. z)...) --> ... --> (λs z. s z)

To do this we need to “open” the abstraction representing 0. This can be achieved by passing the outer s and z as arguments. We achieve what we wanted.

(λs z. s ...(λs z. z) s z...) --> (λs z. s ...z...) = (λs z. s z)

Working backwards, we arrive at our successor function:

(λs z. s z)
<-- (λs z. s ((λs z. z) s z))
<-- (λn. λs z. s (n s z)) (λs z. z)
= succ (λs z. z)

Successor can be thus defined as:

succ = λn. (λs z. s (n s z)) = λn s z. s (n s z)

Once we have a successor operation, defining addition is quite simple if we keep in mind that a Church numeral $$m$$ applies its first argument (s) to its second argument (z) $$m$$ times:

plus = λm n. m succ n

Multiplication follows the same principle:

$m * n = \underbrace{n + (... n}_{m \text{ times}} + 0)$

Hence:

times = λm n. m (plus n) zero

We can have subtraction via a predecessor function, which is quite a tricky one:

pred = λn f x. n (λg h. h (g f)) (λu. x) (λu. u)
minus = λm n. n pred m

We can check if a variable is zero:

is-zero = λn.n (λx. false) true

We can define $$\leq$$

leq = λm n. is-zero (minus m n)

And we can define equality:

equal = λm n. and (leq m n) (leq n m)

#### Pairs

Pairs can be encoded using an abstraction which “stores” its two arguments:

pair = λl r s. s l r

You can think of a s as a “chooser” function which either picks l or r. Selectors for the first and second element are then, respectively, defined as:

fst = λp. p (λl r. l)
snd = λp. p (λl r. r)

Take a look at the selector functions we pass to the pair representation. Are they familiar? (Hint: booleans)

### Recursion

We have seen that we can define booleans, together with a conditional, and numbers, together with arithmetic operations in pure lambda calculus. However, to reach full Turing power, we lack one important ingredient: the ability to loop. To loop in a functional setting, we need the little brother of looping: self-reference.

To see that we can loop, let us look at a term, for which $$\beta$$-reduction never terminates in a normal form. This interesting term, called $$\Omega$$, is defined as follows:

Ω = (λx. x x) (λx. x x)

We see that we have an abstraction which applies its argument to itself and which is applied to itself. How does reduction proceed?

(λx. x x) (λx. x x) --> (x x)[x := (λx. x x)]
= (λx. x x) (λx. x x) --> (x x)[x := (λx. x x)]
= (λx. x x) (λx. x x) --> (x x)[x := (λx. x x)]
...

Immediately after the first reduction step, we are back where we started! Well, we see we can loop forever (diverge), but how is this useful?

In a programming language like OCaml, we are used to defining recursive functions which refer to themselves inside of their body:

let rec fact = fun n ->
if n = 0 then 1 else n * fact (n - 1)

How do we achieve this in lambda? While we have been freely using equations to define names for lambda expressions, these were just meta-definitions of names. That is, when we write

fact = λn. if-true (is-zero n) one (mult n (?fact? (pred n)))

we rely on our meta-language and our common understanding of it to replace any occurrence of ?fact? with the right-hand side of the above equation, as many times as needed. But this is not beta-reduction, that is we are not defining a recursive function as an object in lambda calculus. To get there, we can think of a recursive definition as follows: “Assuming we have a function to call in the recursive case, we can complete the definition”. In Haskell or OCaml, we can simply assume that we already have the function that we are defining. But what is really going on here, is that we can abstract the recursive call as an argument – which corresponds to saying “assuming we already have a function to call in the recursive case”:

fact = λf. λn. if-true (is-zero n) one (mult n (f (pred n)))

Now factorial does not refer to itself anymore, we just need to give it a function to call in the else branch. Easy:

fact = (λf. λn. if-true (is-zero n) one (mult n (f (pred n)))) (λn. if-true (is-zero n) one (mult n (f (pred n))))

Wait, but now what about f in the second case? Ah, no problem:

fact = (λf. λn. if-true (is-zero n) one (mult n (f (pred n))))
((λf. λn. if-true (is-zero n) one (mult n (f (pred n))))
(λn. if-true (is-zero n) one (mult n (f (pred n))))) 

This apparently won’t work… unless we have a way of supplying an argument for f as many times as it’s needed. That is, a way to allow the function reference itself whenever it needs to. This is where fixpoint combinators come in.

In math, a fixed point of a function $$f$$ is an input for which the function returns the input itself:

$f(x) = x$

If the above holds, we say that $$x$$ is a fixed point of $$f$$. A fixpoint combinator (in general called $$\operatorname{fix}$$) is an operation that computes the fixed point of a function. That is, it is a function for which the following equation holds:

fix f = f (fix f)

This equation just spells out that when a function is applied to its fixpoint, the fixpoint shall be returned. Let’s use the above equation on itself, by replacing occurrences of fix f with the right-hand side:

fix f = f (fix f)
= f (f (fix f))
= f (f (f (fix f)))
= ...

Now glance above: “If only we had a way of supplying an argument for f as many times as it’s needed.” Seems we are onto something. Let’s replace f with our factorial:

fact = λf. λn. if-true (is-zero n) one (mult n (f (pred n)))

fix fact
=   fact (fix fact)
=   (λf. λn. if-true (is-zero n) one (mult n (f (pred n)))) (fix fact)
--> (λn. if-true (is-zero n) one (mult n ((fix fact) (pred n))))

This looks promising. The problem? We haven’t defined what fix is, we are just abusing our meta-notation again. In fact, there is more than one possible definition of fix. The simplest one is the Y combinator:

Y = λf. (λx. f (x x)) (λx. f (x x))

Notice how the structure is very similar to $$\Omega$$ above. We should check if it is a fixpoint combinator, that is, if it satisfies the fixpoint equation:

Y g = (λf. (λx. f (x x)) (λx. f (x x))) g
= (λx. g (x x)) (λx. g (x x)))
= g ((λx. g (x x)) (λx. g (x x)))
= g ((λf. ((λx. f (x x)) (λx. f (x x)))) g)
= g (Y g)

We have ourselves a fixpoint combinator. Let us try to use it to define our factorial function:

fact0 = (λf. λn. if-true (is-zero n) one (mult n (f (pred n))))
fact = Y fact0

What happens when we try to apply fact to a numeral?

fact three
=   Y fact0 three
=   (λf. (λx. f (x x)) (λx. f (x x))) fact0 three
--> (λx. fact0 (x x)) (λx. fact0 (x x)) three
--> fact0 ((λx. fact0 (x x)) (λx. fact0 (x x))) three
=   fact0 (Y fact0) three
--> (λn. if-true (is-zero n) one (mult n ((Y fact0) (pred n)))) three
--> if-true (is-zero three) one (mult three ((Y fact0) (pred three)))
--> ...
--> mult three ((Y fact0) (pred three))
=   mult three (fact0 (Y fact0) (pred three))
--> ...
--> mult three (fact0 (Y fact0) (if-true (is-zero (pred three)) one (mult (pred three) ((Y fact0) (pred (pred three)))))
...

However, the $$Y$$ combinator is not universally applicable under any reduction strategy. Consider what happens with the $$Y$$ combinator, if we apply the CBV strategy.

Y g =   (λf. (λx. f (x x)) (λx. f (x x))) g
--> (λx. g (x x)) (λx. g (x x))
--> g ((λx. g (x x)) (λx. g (x x)))
--> g (g (λx. g (x x)) (λx. g (x x)))
--> g (g (g (λx. g (x x)) (λx. g (x x))))
--> ...

For CBV, we need the Z combinator:

λf. (λx. f (λy. x x y)) (λx. f (λy. x x y))

#### Let bindings

The last useful notation to introduce are let-bindings. We have already implemented them as part of our arithmetic expressions language – both as a substitution-based and environment-based evaluator. Let bindings can be introduced to pure lambda-calculus as syntactic sugar – a construct that is defined by translation to a combination of other constructs in the language. Introducing a let-binging corresponds to creating a λ-abstraction and immediately applying it to the bound expression:

let x = e1 in e2 ≡ (λx. e2) e1

We have to define let as syntactic sugar – we cannot write it as a function, the way we did for if-then, add, etc. Why is that the case?

We can also define a recursive version of let – called let rec in OCaml, letrec in Scheme:

let rec f = e1 in e2 ≡ let f = fix (λf. e1) in e2
≡ (λf. e2) (fix (λf. e1))


Where fix is an appropriate fixpoint combinator (e.g., $$Y$$ under CBN, $$Z$$ under CBV and CBN). ## Extensions

While it is useful to show how various important programming concepts and constructs can be expressed in pure lambda-calculus directly, in general, it is a rather inefficient approach.

The approach usually taken in designing lambda-calculus-based languages, is to take the calculus as a core language and add extensions to support various kinds of data.

#### Pairs

<Lambda> ::= <Variable>
| (<Lambda> <Lambda>)
| (λ <Variable> . <Lambda>)
| ( <Lambda> , <Lambda> )
| fst <Lambda>
| snd <Lambda>

Often just written informally as an extension of a previous BNF:

<Lambda> ::= ...
| ( <Lambda> , <Lambda> )
| fst <Lambda>
| snd <Lambda>

Together with the extension to syntax, we need to specify what the meaning of these new construct is. That is, we need to update the reduction strategy and provide reduction rules for the pairs. For primitive operations, these reduction rules are sometimes called $$\delta$$-rules or $$\delta$$-reduction rules:

fst (l, r) -->δ l
snd (l, r) -->δ r

## Reduction vs. Evaluation

### Reducing to a Normal Form

So far, in connection with Lambda calculus, we have only talked about reduction as a transformation of an expression containing redexes into another expression where the redex has been reduced. To actually evaluate a lambda-expression, we can simply iterate the reduction step (with a particular strategy), until we reach a normal form: an expression that cannot be reduced further. Note that some expressions (such as $$\Omega$$) do not have a normal form – under any reduction strategy. Formally, an iteration of reduction steps $$\longrightarrow$$ is written as $$\longrightarrow^{*}$$, which stands for “reduces in zero or more steps to”. Mathematically, it is a reflexive-transitive closure of $$\longrightarrow$$. It is defined by the following two rules:

1. M -->* M
2. M -->* M' there is a N, such that M --> N and N -->* M'

This can be lso expressed as:

• an expression reduces in zero or more steps to itself
• an expression $$M$$ reduces in zero or more steps to the expression $$M'$$, if there is an intermediate expression $$N$$ and $$M$$ reduces to $$N$$ and $$N$$ reduces in one or more steps to $$M'$$

In Haskell, this is expressed as iteration. Assuming that the one-step reduction function implementing a particular strategy has the type Lambda -> Maybe Lambda, we define an iteration function as:

iter :: (a -> Maybe a) -> a -> a
iter step e =
case step e of
Just e' -> iter step e'
Nothing -> e

That is, while we can perform reduction steps on an expression, we do so. If the input expression cannot be reduced (it contains no redex), just return it.

If we have an implementation of a reduction strategy, say, stepNormal :: Lambda -> Maybe Lambda, then we can build a “reducer” by passing it as an argument to iter:

iterNormal :: Lambda -> Lambda
iterNormal = iter stepNormal

## De Bruijn Indices

1. Later, we will add extensions that make many things simpler and also allow us to build realistic programming languages.↩︎