Week 8: Fer-de-lance (closed collaboration)
Fer-de-lance, aka FDL, aka Functions Defined by Lambdas, is an egg-eater-like language with anonymous, first-class functions.
Setup
You can use the starter code from github (which might need a bit of work around the implementation of print
)
or extend/modify your own code for egg-eater
as you prefer.
Your Additions
fdl
starts with the egg-eater
and has two significant syntactic changes.
-
First, it makes function definitions
(defn (f x1 ... xn) e)
a form of expression that can be bound to a variable and passed as a parameter, -
Second, it adds the notion of a
(fn (x1 ... xn) e)
expression for defining anonymous or nameless functions,
pub enum Expr {
...
Fun(Defn),
Call(String, Vec<Expr>),
}
pub struct Defn {
pub name: Option<String>,
pub params: Vec<String>,
pub body: Box<Expr>,
}
For example
(defn (f it)
(it 5))
(let (foo (fn (z) (* z 10)))
(f foo)
)
(defn (compose f g)
(fn (x) (f (g x))))
(defn (inc x)
(+ x 1))
(let (f (compose inc inc))
(f input))
You can write recursive functions as
(defn (f it x)
(it x))
(defn (fac n)
(if (= n 0) 1 (* n (fac (+ n -1)))))
(f fac input)
For a longer example, see map.snek fold.snek
Semantics
Functions should behave just as if they followed a substitution-based semantics. This means that when a function is constructed, the program should store any "free" variables that they reference that aren't part of the argument list, for use when the function is called. This naturally matches the semantics of function values in languages like OCaml, Haskell and Python.
Runtime Errors
There are several updates to runtime errors as a result of adding first-class functions:
-
You should throw an error with
arity mismatch
when there is mismatch in the number of arguments at a call. -
The value in function position may not be a function (for example, a user may erroneously apply a number), which should trigger a runtime error error that reports
"not a function
.
Implementation
Memory Layout
Functions/Closures should be stored in the heap as a tuple
-----------------------------------------------
| code-ptr | arity | var1 | var2 | ... | varN |
-----------------------------------------------
For example, in this program:
(let* ((x 10)
(y 12)
(f (fn (z) (+ x (+ y z)))))
(f 5))
The memory layout of the fn
would be:
----------------------------------
| <address> | 1 | 20 | 24 |
----------------------------------
-
There is one argument (
z
), so1
is stored for arity. -
There are two free variables—
x
andy
—so the corresponding values are stored in contiguous addresses (20
to represent10
and24
to represent 12).
Function Values
Function values are stored in variables and registers
as the address of the first word in the function's memory,
but with an additional 5
(101
in binary) added to the
value to act as a tag.
Hence, the value layout is now:
0xWWWWWWW[www0] - Number
0xWWWWWWW[w111] - True
0xWWWWWWW[w011] - False
0xWWWWWWW[w001] - Pair
0xWWWWWWW[w101] - Function
Computing and Storing Free Variables
An important part of saving function values is figuring out the set of free variables that need to be stored, and storing them on the heap.
Our compiler needs to generated code to store all of the free variables in a function – all the variables that are used but not defined by an argument or let binding inside the function.
So, for example, x
is free and y
is not in:
(fn (y) (+ x y))
In this next expression, z
is free, but x
and y
are not, because x
is bound by the let
expression.
(fn (y) (let (x 10) (+ x (+ y z))))
Note that if these examples were the whole program, well-formedness would signal an error that these variables are unbound. However, these expressions could appear as sub-expressions in other programs, for example:
(let* ((x 10)
(f (fn (y) (+ x y))))
(f 10))
In this program, x
is not unbound – it has a binding
in the first branch of the let
. However, relative
to the lambda
expression, it is free, since there
is no binding for it within the lambda
’s arguments
or body.
You should fill in the function free_vars
that returns
the set of free variables in an Expr
.
fn freeVars(e: &Expr) -> HashSet<String>
You may need to write one or more helper functions
for free_vars
, that keep track of an environment.
Then free_vars
can be used when compiling Defn
to fetch the values from the surrounding environment,
and store them on the heap.
In the example of heap layout above, the free_vars
function should return the set hashset!{"x", "y"}
, and that
information can be used in conjunction with
env
to perform the necessary mov
instructions.
This means that the generated code for a Defn
will look much like it did in class but with
an extra step to move the stored variables
into their respective tuple slots.
Restoring Saved Variables
The description above outlines how to store the free variables of a function. They also need to be restored when the function is called, so that each time the function is called, they can be accessed.
In this assignment we'll treat the stored variables as if they were a special kind of local variable, and reallocate space for them on the stack at the beginning of each function body.
So each function body will have an additional part
of the prelude to restore
the variables onto the stack,
and their uses will be compiled just as local variables are.
This lets us re-use much of our infrastructure of stack
offsets and the environment.
The outline of work here is:
-
At the prelude of the function body, get a reference to the function closure's address from which the free variables' values can be obtained and restored,
-
Add instructions to the prelude of each function that restore the stored variables onto the stack, given this address
-
Assuming this stack layout, compile the function's body in an environment that will look up all variables, whether stored, arguments, or let-bound, in the correct location
The second and third points are straightforward applications of ideas we've seen already – copying appropriate values from the heap into the stack, and using the environment to make variable references look at the right locations on the stack.
The first point requires a little more design work.
If we try to fill in the body of temp_closure_1
above, we immediately run into the issue of where we
should find the stored values in memory.
We'd like some way to, say, move the address of the
function value into rax
so we could start copying
values onto the stack.
But how do we get access to the function value?
To solve this, we are going to augment the calling convention in Fer-de-lance to pass along the closure-pointer as the zero-th parameter when calling a function.
So, for example, in call like:
(f 4 5)
We would generate code for the caller like:
mov rax, [rbp-16] ;; (wherever 'f' is stored)
<code to check that rax is tagged 101, and has arity 2>
push 10 ;; 2nd param = 5
push 8 ;; 1st param = 4
mov rax, [rbp-16] ;;
push rax ;; 0th param = closure
sub rax, 5 ;; remove tag
mov rax, [rax] ;; load code-pointer from closure
call rax ;; call the function
Now the function's closure is available on the stack, accessible just as the 0th argument so we can use that in the prelude for restoration.
Recommended TODO List
-
Move over code from past assignment and/or lecture code to get the basics going. There is intentionally less support code this time to put less structure on how errors are reported, etc. Feel free to start with code copied from past labs. Note that the initial state of the tests will not run even simple programs until you get things started.
-
Implement the compilation of
Defn
andCall
, ignoring free variables. You'll deal with storing and checking the arity and code pointer, and generating and jumping over the instructions for a function. Test as you go. -
Implement
free_vars
, testing as you go. -
Implement storing and restoring of variables in the compilation of
Defn
(both in the function prelude and the place where the closure is created). -
Figure out how to extend your implementation to support recursive functions; it should be a straightforward extension if you play your cards correctly in the implementation of
Defn
(what should you "bind" the name of the function to, in the body of the function?)