CS221/321 Programming Languages Fall 2010, TuTh 1:30pm, Ryerson 251 David MacQueen http://www.classes.cs.uchicago.edu/current/22100-1 ================================================================== Why study (principles of) programming languages? (from ACM SIGPLAN) * To learn widely applicable design and implementation techniques. * To be able to create new domain specific languages (DSLs) or virtual machines. * To learn new computational models and speed language learning. * To be able to choose the right language(s) for a task. ------ * To have a deeper understanding of the fundamental tools of design and programming (the raw materials out of which all software artifacts are constructed). * To acquire knowledge and tools that can be used to think formally and precisely about complex systems. * "Programming is applied logic". ================================================================== How to study programming languages? * Common, traditional "botanical" approach. - Categorize languages by collections of features, or more broadly by "paradigms". - Develop a taxonomy to organize the collection of known languages. - Largely descriptive - Look at a few small programs written in a few representative and/or popular languages. - Choose some features, and compare how languages X, Y, and Z implement those features. - Associate complexes of features with "paradigms" (imperative/procedural, object-oriented, functional, logic programming, ...) * Foundational approach - try to understand the principles of construction and operation of languages in general - develop precise formal models of languages and language features that can be used to mathematically/logically determine important properties. * How do we understand how a language or language feature works? - implementation (maybe simplified, idealized) - formal models operational (transition systems, inference rules, abstract machines) denotational: mappings to mathematical domains "axiomatic" - specialized logics for capturing the meanings of programs - our main methodology for describing programming languages will be "structural operational semantics" (Plotkin) ====================================================================== Useful Mathematical Background (Basic things you should know already.) * "mathematical maturity" * sets - empty set, subset, singleton set, powerset - union, intersection, complement (sets as a boolean algebra) - cartesian products, disjoint sums * relations - binary, n-ary * functions (binary relations) - domain, codomain/range - total, partial, 1-1, onto - inverse function, composition of functions * orderings (binary relations) - partial and total - well-founded - monotonic functions (functions preserving order) * induction - definitions - proofs - fixpoints - structural induction * basic 1st-order logic - variables, terms, relations - and, or, not, implication - universal, existential quantifiers ====================================================================== Programming It is helpful to make concepts more concrete by trying to implement them (represent them in a program). In this course, we use Standard ML (SML) as the language for implementating examples and exercises. If you do not already know SML, you will learn it soon (it's easy!). As a "metalanguage", designed to easily work on "object" languages, ML is ideal for our purposes (and, e.g., for writing compilers, etc.). It is also an example of a "functional" language; Scheme is another example. There will be an out-of-class SML tutorial the first week. There are links on the class web-page to information on SML and SML/NJ, the particular compiler we use here. I assume that you have learned at least a couple of programming languages (e.g. C and Scheme). ====================================================================== CS221 vs CS321 There will be additional reading and projects for grad students, including additional tutorial meetings outside of regular lectures. ====================================================================== A little historical background * Programming languages is a young subject. There were NO programming languages when I was born! * Almost all the creators of today's programming languages are still alive, many of them still actively involved in the subject. * I'll include occasional short history lessons. A well educated person should know something of the history of the subject they are studying. * Our foundational approach to studying programming languages originated in Britain (London, Oxford, Edinburgh, Cambridge) in the 60s and 70s. Christopher Strachey Peter Landin Tony Hoare Rod Burstall Robin Milner I worked with Burstall and Milner in Edinburgh in the mid- 70s through the 80s, where I learned how to understand programming languages. I came there with a PhD in Mathematical Logic (computability theory). In the USA, early pioneers include John McCarthy and John Reynolds (working at Argonne, later at Syracuse). ====================================================================== A first, very simple, example: simple arithmetic expressions E.g 2 + 13 * 4 1. syntax - determining if an expression is syntactically correct - if it is correct, parsing it to determine its syntactic structure - representing its structure for further analysis - lexical structure and syntactic structure 2. static semantics - weeding out syntactically correct programs that don't make sense (vacuous in this simple case; any phrase that can be parsed can be evaluated) 3. dynamic semantics - evaluating expressions to produce a number Let's see how to a process an expression taking the form of a string "2 + 13 * 4". Step 1: We do "lexical analysis" to break up the string of characters into a sequence of "tokens" or symbols. "2 + 13 * 4" ==> Num 2, Oper +, Num 13, Oper *, Num 4 where Num and Oper are classes of tokens with associated values (numbers for Num and operations for Oper). We might alternatively translate "+" and "*" into their own special token classes: Num 2, Plus, Num 13, Times, Num 4 Step 2: Parsing to discover syntactic structure Now we "parse" this string of tokens into a representation of the expression as a tree (which I will present textually using indentation): Plus Num 2 Times Num 13 Num 4 This corresponds to the nested expression structure: 2 + (13 * 4). An alternative tree would be Times Plus Num 2 Num 13 Num 4 corresponding to (2 + 13) * 4. We choose the first form based on the convention that multiplication binds tighter than addition (a conventional "operator precedence" rule). Step 3: Static Semantic analysis Once we have successfully parsed into a tree there is no "static semantic" analysis in this case, because any such expression tree is valid -- there are no bogus expressions to be filtered out. Step 4: Dynamic semantics (evaluation) Only certain subterms are "ready" for evaluation. Those are the terms that consist of an operator applied to two number arguments, such as 13 * 4. These "ready" subterms are called "redexes", and their evaluation to a number is also called "reduction". In our example the subexpression "13 * 4", or Times Num 13 Num 4 is a redex. We reduce it by replacing it with its value, 52, or more precisely, but a new subexpression (Num 52): Plus Plus Num 2 Num 2 Times ==> Num 52 Num 13 Num 4 Now the top-level Plus expression is also a redex, and can be reduced in turn: Plus Plus ==> Num 54 Num 2 Num 2 Times ==> Num 52 Num 13 Num 4 This sequence of two reductions transforms the original expression into a simplified expression, Num 54, representing its value. A less cumbersome way to write this reduction sequence would use the original disambiguated "surface syntax": 2 + (3 * 14) ==> 2 + 52 ==> 54 In some terms, there may be more than one redex. Consider 3 + 1 + 13 * 4 Using standard precedence and association rules (where + associates to the left), this is equivalent to (3 + 1) + (13 * 4) or Plus Plus Num 3 Num 1 Times Num 13 Num 4 In this expression, both argument subexpressions for the top-level Plus are redexes. We can choose one at random to reduce (say be flipping a coin), or we can adopt a standard strategy for picking the next redex, like left to right. Using the left-to-right reduction strategy yields (3 + 1) + (13 * 4) ==> 4 + (13 * 4) ==> 4 + 52 ==> 56.