CS221/321 Lecture 4, Oct 7, 2010 Section 2. Expanding the language -- variables and let bindings ----------------------------------------------------------------------- Add let expressions such as let x = 3+2 in x * 2 that bind a variable to a value within the scope of a body. The general form is let x = in where and are arbitrary expressions, and (free) occurrences of x in the body expression are covered by the local let definition. We can call this extended language SAEL for "simple arithmetic expressions with let". Here is the abstract syntax. ---------------------------------------------------------------------- Figure 2.1: abstract syntax of SAEL ---------------------------------------------------------------------- v ::= x, y, z, ... (alphanumeric variables) n ::= 0, 1, 2, ... (natural numbers) bop ::= Plus, Times, ... (primitive binary operators) e ::= Num(n) | Var(v) | Bapp(bop, e, e) | Let(v, e, e) ---------------------------------------------------------------------- We are replacing the former Plus and Times syntax constructors to a new form Bapp(bop,e1,e2) that generalizes the Plus(e1,e2) and Times(e1,e2) primitive operator forms. "Bapp" stands for "Binary operator APPlication". Plus and Times become members of a predefined (and possibly larger) set of primitive (binary) operators: bop ::= Plus | Times | ... and the old Plus and Times expressions are translated as follows: Plus(e1, e2) ==> Bapp(Plus, e1, e2) Times(e1, e2) ==> Bapp(Times, e1, e2) There is nothing special about binary operations -- in general we might also have some primitive unary operations as well, or in fact n-ary operations for arbitrary n. But just to keep the rules simpler we will stick for the time being to only binary primitive ops. How do we evaluate these new forms of expression? To deal with variables, there are two choices: (1) Eliminate the variable before we have to evaluate it, so we end up only evaluating variable-free basic expressions. (2) Provide a mechanism (an environment) that tells us how to interpret or translate a variable into a value. We will start with approach (1), where we will eliminate variables by "substitution". Here is a big step rule for evaluating let: (Lv) Let(v, e1, e2) ⇓ n <= e1 ⇓ n1 & [Num(n1)/v]e2 ⇓ n The notation "[Num(n1)/v]e2" indicates the result of substituting the value expression Num(n1) for the variable v where ever v occurs (free) in e2. This rule specifies that the definiens e1 be evaluated before substitution, and we call this a "by-value" substitution. An alternative is to substitute the definiens itself for x in the body. This is specified by the following "by-name" rule: (Ln) Let(v, e1, e2) ⇓ n <= [e1/v]e2 ⇓ n ---------------------------------------------------------------------- Figure 2.2: SAEL[BSv] - "By-Value" big-step sematics for SAEL ---------------------------------------------------------------------- (1) Num(n) ⇓ n (2) Bapp(bop, e1, e2) ⇓ n <= e1 ⇓ n1 & e2 ⇓ n2 & prim(bop,n1,n2) = n (3) Let(v, e1, e2) ⇓ n <= e1 ⇓ n1 & [Num(n1)/v]e2 ⇓ n ---------------------------------------------------------------------- ---------------------------------------------------------------------- Figure 2.3: SAEL[BSn] - "By-Name" big-step sematics for SAEL ---------------------------------------------------------------------- (1) Num(n) ⇓ n (2) Bapp(bop, e1, e2) ⇓ n <= e1 ⇓ n1 & e2 ⇓ n2 & prim(bop,n1,n2) = n (3) Let(v, e1, e2) ⇓ n <= [e1/v]e2 ⇓ n ---------------------------------------------------------------------- ---------------- Terminology: The terms "by-value" and "by-name" come from Algol 60, which used a "call-by-name" parameter-passing convention for function calls. Most modern languages use "call-by-value" for function parameter passing. Here we are dealing with eliminating local variable definitions, but we will see that this is closely related to function parameter binding. ---------------- How about extending the small-step dynamic semantics for let? The new by-value rules are e1 ↦ e1' (Lv1) ------------------------------------- Let(x, e1, e2) ↦ Let(x, e1', e2) (Lv2) -------------------------------------- Let(x, Num(n), e2) ↦ [Num(n)/x]e2 Call the system obtained by adding these rules to SAE[SS], SAEL[SSv]. And the by-name rule is (Ln) ------------------------------ Let(x, e1, e2) ↦ [e1/x]e2 Call the system obtained by adding rule (Ln) to SAE[SS], SAEL[SSn]. ---------------------------------------------------------------------- Figure 2.4: SAEL[SSv] - "By-Value" small-step sematics for SAEL ---------------------------------------------------------------------- (1) Bapp(bop, Num n1, Num n2) ↦ Num p where p = prim(bop,n1,n2) (2) Bapp(bop, e1, e2) ↦ Bapp(bop, e1', e2) <= e1 ↦ e1' (3) Bapp(bop, Num n1, e2) ↦ Bapp(bop, Num n1, e2') <= e2 ↦ e2' (4) Let(x, e1, e2) ↦ Let(x, e1', e2) <= e1 ↦ e1' (5) Let(x, Num(n), e2) ↦ [Num(n)/x]e2 ---------------------------------------------------------------------- ---------------------------------------------------------------------- Figure 2.5: SAEL[SSn] - "By-Name" small-step sematics for SAEL ---------------------------------------------------------------------- (1) Bapp(bop, Num n1, Num n2) ↦ Num p where p = prim(bop,n1,n2) (2) Bapp(bop, e1, e2) ↦ Bapp(bop, e1', e2) <= e1 ↦ e1' (3) Bapp(bop, Num n1, e2) ↦ Bapp(bop, Num n1, e2') <= e2 ↦ e2' (4) Let(x, e1, e2) ↦ [e1/x]e2 ---------------------------------------------------------------------- Which is more efficient, by-value or by-name? ---------------------------------------------- Consider these examples, where big represents a large expression that is expensive to evaluate. (1) let x = big in 3 (2) let x = big in (x + x) + x Under by-name evaluation, (1) reduces in one step to 3, let x = big in 3 ↦ [big/x]3 = 3 without evaluating big at all, while by-value will have to fully evaluate big before throwing its result away because the bound variable x does not appear in the body. So by-name wins for this one. On the other hand, with (2) by-name gives let x = big in (x + x) + x ↦ (big + big) + big and we've created three copies of big, each of which will have to be evaluated. But under by-value let x = big in (x + x) + x ↦* let x = Num(n) in (x + x) + x ↦ (Num(n) + Num(n)) + Num(n) so big is only evaluated once. So for example (2), by-value wins. Basic terminology: bound variable, free variable, scope ------------------------------------------------------- let in The expression is the "scope" of the . definiens ↓ let x = e1 in ... x ... x ... ↑ ↑ ↑ binding bound occurrence occurences An occurrence of a variable x is bound if it is in the body of a let expression whose definition binds x. Otherwise, the occurrence of x is "free". A given expression can have both bound and free occurrences of x. let x = 2 * x in x + 1 | | | | | bound | free binding Any bound occurrence of a variable is associated with a unique definition in an enclosing let expression. ____________________ / \ let x = 5 in let x = 2 * x in x + 1 \___________/ FV: a function that computes the set of free variables of an expression: FV(x) = {x} FV(Num(n)) = {} FV(Bapp(bop,e1,e2)) = FV(e1) ⋃ FV(e2) FV(Let(x,e1,e2)) = FV(e1) ⋃ (FV(e2)\{x}) Defn: An expression is "closed" if it has no free variables, i.e. FV(e) = {}. Substitution ------------ Defn: [e/x]e' -- each free occurrence of x in e' is replaced by e. [e/x]x = e [e/x]y = y (y != x) [e/x]Num(n) = Num(n) [e/x]Bapp(bop,e1,e2) = Bapp(bop, [e/x]e1, [e/x]e2) [e/x]Let(x, e1, e2) = Let(x, [e/x]e1, e2) [e/x]Let(y, e1, e2) = Let(y, [e/x]e1, [e/x]e2) (y != x) But what about the following example? (3) [y/x](let y = 3 in x + y) = let y = 3 in [y/x](x+y) = let y = 3 in y + y The variable y, which was free in the substituted expression y, has been "captured" by the let definition of y when it is substituted for x in the scope of the let definition. This is a mistake. Change of bound variable: let x = 3 in x + 1 === let y = 3 in y + 1 If we uniformly replace a bound variable by another bound variable, the two let expressions are equivalent (will always evaluate to the same result. This change of bound variable is called "α-conversion". But not all bound variable changes are valid: let x = 3 in x + y =/= let y = 3 in y + y Here the free variable y in the body on the left was captured and became bound when we changed the bound variable to y. So we can replace a bound variable x with another bound variable y iff y does not occur free in the scope of the x binding. To avoid the free variable capture illustrated by example 3, we must modify the last clause in the defn of substitution. [e/x]Let(y,e1,e2) = Let(y, [e/x]e1, [e/x]e2) (y != x and y ∉ FV(e)) = Let(z, [e/x]e1, [e/x]([z/y]e2)) where z ∉ FV(e2) if y ∈ FV(e) Since we have an infinite supply of variables, we can always find such a variable z. Implementation ---------------------------------------------------------------------- Program 2.1 "by-value" big-step evaluator for SAEL (prog_2_1.sml) ---------------------------------------------------------------------- (* uses module VarSet for sets of variables *) type variable = string datatype bop = Plus | Times datatype expr = Num of int | Var of variable | Bapp of bop * expr * expr | Let of variable * expr * expr (* newVar : variable -> variable *) local (* following two declarations are local to the body between "in" and "end" *) val count = ref 0 (* a counter; an updateable integer reference *) fun next () = (!count before count := !count + 1) (* return current count and then increment it *) in (* following declaration is exported *) fun newVar (base: string) = base ^ Int.toString(next()) (* concatenate base string with string representation of next count value *) end (* freeVars : expr -> variable set *) fun freeVars (Num n) = Set.empty | freeVars (Var v) = Set.single v | freeVars (Bapp(_, e1, e2)) = VarSet.union(freeVars e1, freeVars e2) | freeVars (Let(v, e1, e2) = VarSet.union(freeVars e1, VarSet.diff(freeVars e2, Set.single v)) (* sub : expr * variable * expr -> expr * substitution of an expr for a variable in an expr *) fun sub (e1, v, e2 as Num n) = e2 | sub (e1, v, e2 as Var v') = if v = v' then e1 else e2 | sub (e1, v, Bapp(bop, eL, eR)) = Bapp(bop, sub(e1, v, eL), sub(e1, v, eR)) | sub (e1, v1, Let(v2, eDef, eBody)) = let val v3 = newVar(v2) in Let(v3, sub(e1, v1, eDef), sub(e1, v1, sub(Var v3, v2, eBody))) end (* values are expressions of the form Num(n) for an integer n *) fun isValue(Num n) = true | isValue(_) = false (* A Big-Step evaluator *) (* FreeVar is raised if a free variable is encountered during evaluation *) exception FreeVar fun prim(Plus, n1, n2) = n1 + n2 | prim(Times, n1, n2) = n1 * n2 (* eval: exp -> exp; * Assume: argument is a closed term * the result will be a value *) fun eval(e as Num _) = e (* already fully evaluated *) | eval(Var x) = raise FreeVar (* a free variable => open expression *) | eval(Bapp(bop, e1,e2)) = (* evaluate both arguments, left to right *) let val Num(n1) = eval e1 val Num(n2) = eval e2 in Num(prim(bop, n1, n2) (* perform primative op and package as value exp *) end | eval(Let(x,def,body)) = (* substitute the value of def in the body and evaluate result *) eval(sub(eval def, x, body)) (* example: let x = 3 in x+x *) val ex1 = Let("x", Num(3), Bapp(Plus, Var("x"), Var("x"))) eval ex1 ==> Num(6) ---------------------------------------------------------------------- To get a "by-name" big-step evaluator for SAEL (using rule (Ln) instead of (Lv)), all we have to do is change the last clause of the eval function in the above program: | eval(Let(x,def,body)) = (* substitute the def for x in body and eval result *) eval(sub(def, x, body)) [Note: Both variants of eval are given in prog_2_1.sml, renamed evalBN and evalBN, respectively.] Question: Do evalBV and evalBN always produce the same result? ∀ e . evalBV e ≅ evalBN e (where the relation symbol ≅ means both sides converge to the same value, or both sides diverge (fail to terminate)). Questions: Is by-value big-step semantics terminating? Is by-name big-step semantics terminating? That is, does evaluation of every expression terminate?