CMSC 22610: Implementation of Computer Languages I Winter, 2011 Course Project Summary. I. The course project is to implement a small subset of ML called MinML. The project is divided into four subprojects: 1. lexical analyzer 2. parser 3. type-checker 4. code-generator In 4, we will generate code for a simple virtual machine rather than a hardware instruction architecture like x86. ---------------------------------------------------------------------- II. Description of MinML The sample language we will implement is a miniture version of ML that we will call MinML. We exclude many features of full ML: * module system * abstype and withtype declarations * exceptions * record types * clausal form of function definitions * nested patterns * operator overloading (except equality) * symbolic identifiers (except for ==, <, etc.) * refs and arrays The basis library of predefined types and functions is also minimal. The features that MinML does share with ML are: * first-class functions * algebraic datatypes (with limited pattern matching) * polymorphic types, with automatic type inference Types ----- Primitive types supported: integers (Int), Booleans (Bool), and strings (Str). Compound types supported: tuples, algebraic data types (e.g. List). Identifiers ----------- There are 4 classes of identifiers: vid : value variables; alphanumeric, initial lower-case cid : data constructors; alphanumeric, initial upper-case tid : types and type constructors; alphanumeric, initial upper-case tyv : type variables: alphanumeric, initial lower-case These are nonterminal symbols in the grammar below. Examples: vid : x, y, a, b, u32, foo, fact, arg1, ... cid : Nil, Cons, True, False, X13, ... tid : Int, Bool, List, Pair, T, R435, ... tyv : a, b, s, t, x, y, s3, foo, ... There is also a fixed set of symbolic identifiers that are used for predefined primitive operations. primid: +, -, *, /, %, ~, -- arithmetic ops ||, &&, -- boolean ops ==, <>, <, >, <=, >=, -- relational ops @ -- string concatenation These are also infix operator symbols (except for ~) having various conventional precedences. Semantically, the symbols in primid are treated the same as the other value variables in vid. Constants --------- n : integer constants, e.g. 1, 23, ~14 s : string constants, e.g. "a", "a-13", "Fred Smith", ... The Boolean constants True and False are considered to be constructors, i.e. members of the identifier class cid. They are predefined. Programs -------- Top level program structure is a sequence of declarations followed by an expression, separated by semicolons. Concrete Syntax of MinML ------------------------ This is a preliminary, informal grammar for MinML. It uses the following conventions: [ ]? -- an optional occurrence of [ ]* -- zero or more occurrences of [ ]+ -- one or more occurrences of Vertical bars, |, are used in the grammar metanotation to separate variants of a phrase class (nonterminal symbol) like TopDecl. They also can appear as literal terminal symbols as in the rules for datatype and case, where they appear in quotes ("|"). Prog ::= [ TopDecl ; ]* Exp TopDecl ::= type tid [TypeParams]? = Type | datatype tid [TypeParams]? = ConsDecl [ "|" ConsDecl ]* | ValueDecl TypeParams ::= tyv | ( tyv [ , tyv ]* ) Type ::= Type -> Type | AtomicType [ * AtomicType ]* AtomicType ::= tid | tyv | tid ( Type [ , Type ]* ) | ( Type ) ConsDecl ::= cid [ ( Type ) ]? ValueDecl ::= val TuplePat = Exp | fun FunDef [ and FunDef ]* FunDef ::= vid TuplePat = Exp Const ::= n | s | cid Pat ::= Const | cid TuplePat | TuplePat TuplePat ::= AtomicPat | ( AtomicPat [ , AtomicPat ]* ) AtomicPat ::= vid -- variable | _ -- wildcard Exp ::= vid | Const | Exp || Exp | Exp && Exp | Exp == Exp | Exp <> Exp | Exp < Exp | Exp <= Exp | Exp > Exp | Exp >= Exp | Exp @ Exp | Exp + Exp | Exp - Exp | Exp / Exp | Exp % Exp | ~ Exp | Exp Exp | ( Exp [ , Exp ]* ) | ( Exp [ ; Exp ]* ) | if Exp then Exp else Exp | let [ValueDecl]+ in Exp [ ; Exp]* end | case Exp of Match [ "|" Match ]* end | fn vid => Exp Match ::= Pat => Exp n ::= 0 | 1 | 2 | ... s ::= Semantics --------- MinML is a call-by-value (i.e. "strict") functional language. It is approximately a small subset of Standard ML, and evaluation behaves the same as in Standard ML. ---------------------------------------------------------------------- Examples: * Expressions: 37 True "abc" x x * y + z/2 foo (x-1,"x"@z) if x == y - 3 then fact z else 47 case m of Nil => True | Cons(x,xs) => False end let val x = 3 in x + x end * Declarations: val x = 3 val (u,v) = f 8 datatype Unit = Unit datatype Bool = True | False datatype List (a) = Nil | Cons (a * List(a)) type Pair (a) = a * a fun not b = case b of True => False | False => True fun fact n = if n == 0 then 1 else n * fact(n-1) fun append(m,n) = case m of Nil => n | Cons(x,xs) => Cons(x,append(xs,n)) ---------------------------------------------------------------------- Document History ---------------- [1/13/2011, 15:13] Put vertical bar characters within grammar rules in double quotes to distinguish them from vertical bars in grammar notation, where they separate production right-hand-sides. [1/13/2011, 15:30] Changed symbol for "not equal" operation from "/=" to "<>" to conform to SML notation.