CS221/321
Lecture 11, Nov 23, 2010

Section 6. Mutable Storage

We could consider mutable storage and assignments using three
different approaches:

(1) Move to a new elementary language: a "simple imperative
language" similar to the Bims language introduced in Huttel, Chapter
3. You can read about this approach, including the addition of
constrol structures, variable bindings, and procedures, in Huttel
Chapters 3 through 7. A feature of this approach is the separation
of syntax constructions into expressions and "statements" or "commands".

(2) Add mutable store and assignments using ref values (references)
as found in Standard ML. This is obviously the most direct way of
dealing with stores starting with the language PTFun that we already
have.

(3) Model stores and assignments using monads, as is done in "pure"
functional languages like Haskell.

I'll start with approach (2) and talk about (3) next week in the last
lecture of the course.

----------------------------------------------------------------------

References:

First lets treat Ref as a basic syntactic form, as we treated Fst,
Inl, etc. before introducing polymorphism. This treatment will be 
similar to Harper, Chapter 14.

Syntactically, we add three new forms, corresponding to the concrete
syntax

   ref(e)       -- creation of an initialized ref cell
   !e           -- dereferencing, returning the contents of a ref cell
   e1 := e2     -- assignment, updating the contents of a ref cell

For small-step semantics, we need to also add a new syntactic category
of "locations" (or memory addresses). 

   l ∈ Locations

These location expressions will not occur in original programs, but
will be introduced by reductions in the small-step
semantics. Locations designate places in the mutable memory where 
values can be stored.

A memory M will be a finite map from locations to values:

  M: Locations → Values

(Memories are often called "states", as in Huttel). Evaluation of
expression will be done with respect to a current state, and the
evaluation of certain expressions, namely those involving assignments,
may modify the current state.

Example: achieving recursion by updating memory

    f = ref(λx:Int. x)       (f : Ref(Int→Int))

    fact = λx:Int. if x = 0 then 1 else x * !f(x - 1)  (fact: Int→Int)

    fact(4) = 4 * (λx.x)3 = 12

    f := fact

    fact(4) ==> 24


----------------------------------------------------------------------
Abstract Syntax:

   l  ::=  _ (members of Locations)

   τ  ::=  ... |  Ref(τ)

   e  ::=  ... |  Ref(e)  |  Dref(e) |  Set(e1,e2)  |  l

   v  ::=  ... |  l

----------------------------------------------------------------------


Type expressions that include locations requires that we be able to
type locations. Locations resemble free variables, and so we will 
introduce a new kind of typing environment for locations:

   Λ : Locations → Types

In any memory M compatible with Λ, given location l can only can
only contain values of type Λ(l):

   ⊦ M(l) : Λ(l)

Typing judgements must be modified to include a memory typing Λ
as well as a (free) variable typing Γ:

   Λ; Γ ⊦ e : τ 


----------------------------------------------------------------------
Typing Rules:

	    Λ; Γ ⊦ e : τ
(RT1) ------------------------
       Λ; Γ ⊦ Ref(e) : Ref(τ)

	Λ; Γ ⊦ e : Ref(τ)
(RT2) --------------------
       Λ; Γ ⊦ Dref(e) : τ

	Λ; Γ ⊦ e1 : Ref(τ)    Λ; Γ ⊦ e2 : τ 
(RT3) -------------------------------------
           Λ; Γ ⊦ Set(e1,e2) : Unit

           Λ(l) = τ 
(RT4) -------------------
       Λ; Γ ⊦ l : Ref(τ)


Plus modified versions of previous typing rules with Λ added to the
contexts. For Instance:

          Γ(x) = τ 
(RT5)  --------------
        Λ; Γ ⊦ x : τ 


              Λ; Γ[x:τ1] ⊦ e : τ2
(RT6)  --------------------------------
         Λ; Γ ⊦ Fun(x,τ1,e2) : τ1 → τ2


         Λ; Γ ⊦ e1 : τ1 → τ    Λ; Γ ⊦ e2 : τ1
(RT7)  ----------------------------------------
               Λ; Γ ⊦ App(e1,e2) : τ


         Λ; Γ ⊦ e1 : Bool    Λ; Γ ⊦ e2 : τ    Λ; Γ ⊦ e3 : τ
(RT8)  -----------------------------------------------------
                     Λ; Γ ⊦ If(e1,e2,e3) : τ


Note that in a derivation under these rules, the Λ context remains
fixed for all rules throughout the derivation.  Another way of putting
this is that the scope of Λ is the whole expression.
----------------------------------------------------------------------


Evaluation:
-----------

A small-step dynamic semantics must use a transition relation that
involves memories as well as expressions:

   (M, e) ↦ (M', e')

Transitions will always modify the expression (e' != e), and sometimes
the memory will also be modified.


----------------------------------------------------------------------
Evaluation [CBV]:

These are the new rules involving the new operators
Ref, DRef, and Set.

Search rules:

             (M,e) ↦ (M',e')
(RE1)  ------------------------------
        (M, Ref(e)) ↦ (M', Ref(e'))


               (M,e) ↦ (M',e')
(RE2)  --------------------------------
        (M, DRef(e)) ↦ (M', DRef(e'))


              (M,e1) ↦ (M',e1')
(RE3)  -------------------------------------
        (M, Set(e1,e2)) ↦ (M', Set(e1',e2))


               (M,e2) ↦ (M',e2')
(RE4)  -------------------------------------
        (M, Set(v1,e2)) ↦ (M', Set(v1,e2'))


Redex rules:

             (l = fresh(M))
(RE5)  --------------------------
        (M,Ref(v)) ↦ (M[l=v],l)


              (l ∈ dom(M))
(RE6)  -------------------------
        (M,DRef(l)) ↦ (M,M(l))


              (l ∈ dom(M))
(RE7)  ----------------------------
        (M,Set(l,v)) ↦ (M[l=v],())


We also inherit modified versions of the standard transition
rules for TFun, such as these rules for App:

         (v1 = Fun(x,τ,e); v2 ∈ Value)
(RE8)  ---------------------------------
        (M, App(v1,v2)) ↦ (M, [v2/x]e)


                 (M,e1) ↦ (M',e1')
(RE9)  -------------------------------------
         (M,App(e1,e2)) ↦ (M',App(e1',e2))


                  (M,e2) ↦ (M',e2')
(RE10)  -------------------------------------  (v1 ∈ Value)
          (M,App(v1,e2)) ↦ (M',App(v1,e2'))


----------------------------------------------------------------------


Type Soundness
--------------

We need to define a relation between memories M and location typings
Λ, that expresses the property that M "conforms to" Λ.

Defn 6.1: ⊦ M : Λ  iff 
  (1) dom(M) = dom(Λ)
  (2) ∀l ∈ dom(Λ). Λ;∅ ⊦ M(l): Λ(l)

That is, ⊦ M:Λ if they have the same set of locations as their
domains, and at each location, the value stored in M has the type
specified by Λ.

Defn 6.2: Λ ⊦ (M,e) : τ  iff  ⊦ M:Λ  &  Λ;∅ ⊦ e: τ 


Theorem 6.1 [Preservation]:
   Λ ⊦ (M,e): τ & (M,e) ↦ (M',e')  =>
     ∃Λ'. Λ ⊆ Λ' &  Λ' ⊦ (M',e'): τ.


Theorem 6.2 [Progress]: 
  Λ ⊦ (M,e) : τ  => e a value or ∃M',e'. M ⊆ M' & (M,e) ↦ (M',e').


Note that in both of these statements, it is assumed that e is
closed w.r.t. variables, but e may contain "free" location names.
(In fact, all location names are free, since there is no construct
that "binds" a location name.)

We will need the usual Inversion Lemma for the new typing judgements,
which we will assume without stating it in detail.

Proof of Preservation:
----------------------
We assume the hypotheses:

(H1) (M,e) ↦ (M',e')
(H2) Λ ⊦ (M,e): τ   

and proceed by induction on the derivation of (H1).

Base Case: (H1) is derived using rule (RE5).
Then 
  (1) e = Ref(v) for some value v
  (2) e' = l, where l is a fresh location, i.e.,
  (3) l ∉ dom(M)
  (4) M' = M[l=v]

  (5) Λ ⊦ (M,Ref(v)) : τ   by (1) and (H2)

  (6) τ = Ref(τ') for some τ', and
  (7) Λ ⊦ (M,v): τ'   by (5) and Inversion of (RT1)
  (8) Λ ⊦ v: τ'       by Defn 6.2

  (9) Let Λ' = Λ[l: τ']
  (10) Λ ⊆ Λ'    by (9) and (3)
  (11) ⊦ M : Λ    by (5) and Defn 6.2
  (12) ⊦ M' : Λ'  by (9) and Defn 6.1

  (13) Λ' ⊦ l : τ'   by (9) and (RT4)
  (14) Λ' ⊦ (M',l): Ref(τ')   by (12), (13), and Defn 6.1.

  (15) ∃Λ'. Λ ⊆ Λ' &  Λ' ⊦ (M',e'): τ  by (14) and (2).  [X]


Ind Case: (H1) is derived using rule (RE1).
Then 
  (1) e = Ref(e1) for some e1
  (2) e' = Ref(e1') for some e1', where
  (3) (M,e1) ↦ (M',e1')

  (IH) ∀(Λ1,τ1).
       Λ1 ⊦ (M,e1): τ1 => ∃Λ1'. Λ1 ⊆ Λ1' &  Λ1' ⊦ (M',e'): τ1

  (4) Λ ⊦ (M,Ref(e1)) : τ   by (1) and (H2)

  (5) τ = Ref(τ') for some τ', and
  (6) Λ ⊦ (M,e1): τ'   by (4) and Inversion of (RT1)

  (7) ∃Λ1'. Λ ⊆ Λ1' &  Λ1' ⊦ (M',e1'): τ'  by (6) and (IH)
  (8) Let Λ' be a witness for (7), so
  (9) Λ ⊆ Λ' and
  (10) Λ' ⊦ (M',e1'): τ'

  (11) Λ' ⊦ (M',Ref(e1')): Ref(τ')   by (10) and (RT1)
  (12) Λ' ⊦ (M',e'): τ    by (1) and (5)

  (13) ∃Λ'. Λ ⊆ Λ' &  Λ' ⊦ (M',e'): τ. by (12) and (2). [X]


The other cases are similar.  [XX]

----------------------------------------------------------------------

Proof of Theorem 6.2: Progress
------------------------------

We start with the hypothesis:

  (H) Λ ⊦ (M,e) : τ

By Definition 6.2, this expands into a pair of hypotheses:

  (H1) ⊦ M : Λ 
  (H2) Λ; ∅ ⊦ e : τ

The proof proceeds by induction on the derivation of (H2).

Base Case: (H2) by rule (RT4).
  (1) e = l for some location l, by Case Hyp.
  (2) e is a value, by defn of value [X]

Base Case: (H2) by rule (RT5).
  This is impossible, since Γ = ∅.

Ind. Case: (H2) by rule (RT1).
  (1) e = Ref(e1), and
  (2) τ = Ref(τ1)  by Case Hyp., where
  (3) Λ; ∅ ⊦ e1 : τ1

  (IH) e1 is a value or  (M,e1) ↦ (M',e1')

  Case (IH1): e1 is a value.
    (5) e1 = l1 for some location l1, by Canonical Forms Lemma(*)
    (6) (M,Ref(e1)) ↦ (M,v)  where v = M(l).  [X]

  Case (IH2): (M,e1) ↦ (M',e1').
    (7) (M, Ref(e1)) ↦ (M', Ref(e1')), by (RE1)
    (8) (M, e) ↦ (M',e') where e' = Ref(e1'), by (1), (7). [X]

Ind. Case: (H2) by (RT2).
   This is similar to the (RT1) case.

Ind. Case: (H2) by (RT7).
  (1) e = App(e1,e2) by Case Hyp.
  (2) Λ; ∅ ⊦ e1 : τ1 → τ for some τ1, and
  (3) Λ; ∅ ⊦ e2 : τ1 by Inversion of (RT7)

  (IH1) e1 a value or (M,e1) ↦ (M',e1')
  (IH2) e2 a value or (M,e2) ↦ (M',e2')

  Case (IH1a) e1 a value
    (4) e1 = Fun(x,τ1,e3), by Cannonical Forms Lemma
    Case (IH2a) e2 a value.
      (5) (M, App(e1,e2)) ↦ (M, [e2/x]e3) by (RE8)
      (6) (M, e) ↦ (M, e') where e' = [e2/x]e3 by (1), (5). [X]
    Case (IH2b) (M,e2) ↦ (M',e2')
      (7) (M, App(e1,e2)) ↦ (M', App(e1,e2'))
      (8) (M, e) ↦ (M', e') where e' = App(e1,e2') by (1), (7). [X]

  Case (IH1b) (M,e1) ↦ (M',e1')
      (9) (M, App(e1,e2)) ↦ (M', App(e1',e2)) by (RE9)
      (10) (M, e) ↦ (M', e') where e' = App(e1',e2) by (1), (9). [X]

Other inductive cases are similar to (RT1) or (RT7). 
                                                       [XX]


----------------------------------------------------------------------

Polymorphic Typings for State primitives.

In PTFun, we can treat Ref, DRef, and Set as primitive functions
with the following polymorphic types:

     Ref :  ∀t. t → Ref(t)

     DRef : ∀t. Ref(t) → t

     Set :  ∀t. Ref(t) * t → Unit

E.g.  Ref[Int](Num 3)


Polymorphic typings in ML:

     ref : 'a -> 'a ref

     !   : 'a ref -> 'a 

     :=  : 'a ref * 'a -> unit

 E.g. ref 3
    
Example:

   let val r = ref(fn x => x)          [ r : ('a -> 'a) ref ]
    in r := (fn x: int => x + 1);      [ r : (int -> int) ref ]
       !r true                         [ r : (bool -> bool) ref ]
   end

References have introduced unsoundness in the type system!!!

After years of experimentation with fixes for this problem, the ML
community settled on the "value restriction":

   A variable declaration (like "val r = ref(fn x => x)") can
   only have its type generalized (made polymorphic) if the
   definients is a value expression (which it is not, in this
   case).


This issue does not affect PTFun with polymorphically typed
Ref, DRef, and Set primitives, because to make r polymorphic
we will have to explicitly abstract over a type parameter, 
as in:

   let r = Λt.Ref[t → t](λx: t.x)    [ r : ∀t.Ref(t → t) ]
    in Set(r[Int], (λx: int.x + 1));  [ r[Int] : Ref(Int → Int) ]
       DRef(r[Bool]) true             [ r[Bool] : Ref(Bool → Bool) ]
   end

Since the Λ-abstraction defining r is a value, the application
of the Ref constructor is suspended, and so the actual allocation
of the ref-cell does not take place until r is applied to a type.
There are two such applications: r[Int] and r[Bool]; these produce
two {\em different} ref-cells, one containing Ints and the other
Bools. So there is not type conflict.

======================================================================


And now for something different -- monads!

See state.sml.


======================================================================
======================================================================

Summary
-------

What have we learned about programming languages?

* Careful, incremental development of basic concepts.
  simple arithmetic expressions  -- a (seemingly) trivial language SAE
  let: local variables, bindings, scope, free and bound variables
    substitution, free variable capture
    environments [SAEL]
  functions: abstraction and application  [Fun]
  conditional expressions (boolean values, relational operators)
  recursive functions (e.g. factorial)
    the Y combinators
  types: basic types, type checking (TFun)

* Inductive proof techniques
  

What have we not learned?

* Lots of details about lots of "real" languages.

* Have not surveyed various "programming paradigms" (except
  the "functional programming" paradigm). No object-oriented,
  no logic programming, etc.