We’ve seen how lists are built with []
and
(:)
, and deconstructing with patterns using []
and (:)
. Now we’ll see a general way for users to define
new types that have one or more kinds of values.
As an example, currency converter. For simplicity, only two kinds of currency.
data Currency
= USD Float -- US Dollars
| JPY Float -- Japanese Yen
deriving (Eq, Show)
The data
keyword defines a new algebraic data type
(ADT), or simply datatype, in this case called
Currency
. Values of type Currency
are either
the data constructor called USD
and a single
Float
or the data constructor JPY
and a single
Float
. The smallest denomination of US currency, one cent,
is one-hundredth of a dollar; the smallest denomination of Japanese
currency is one yen. We choose to track JPY
with
Float
s to provide finer-grained precision; these will be
rounded to Integer
values later.
The deriving (Eq, Show)
clause instructs Haskell to
automatically implement a function called
show :: Currency -> String
which is described by the
Show
type class, and an equality function
(==) :: Currency -> Currency -> Bool
which described
by the Eq
type class. We’ll talk more about these later.
Try the interactions below with and without the deriving
clause.
> USD 11.99
> JPY 1353
> USD 11.99 == USD 11.99
> USD 11.99 == JPY 1353
To extract data wrapped by data constructors, we use patterns involving the data constructors and variables. For example, the following function allows adding two values in either currency.
yenPerDollar = 112.86 -- as of 9/27/17
dollarsPerYen = 1 / yenPerDollar
add :: Currency -> Currency -> Currency
add (USD d1) (USD d2) = USD (d1 + d2)
add (JPY y1) (JPY y2) = JPY (y1 + y2)
add (USD d) (JPY y) = USD (d + y * dollarsPerYen)
add (JPY y) (USD d) = JPY (y + d * yenPerDollar)
Note that any amount has two valid representations; the last two equations choose to retain the base currency of the first argument.
In general, a datatype definition contains one or more data
constructors C, each beginning with
a capital letter (e.g. A
, B
, C
,
D
below). The name of the new type T (e.g. T
below) must also
start with a capital letter.
data T
= A
| B Int
| C Bool String
| D (Char, String)
deriving (Eq, Show)
Each data constructor wraps zero or more values:
A
does not carry any additional data,B
carries a single Int
,C
carries one Bool
and one
String
, andD
carries one pair containing a Char
and a
String
.Data constructors are functions that, when applied to arguments of
the specified types, produce values of the datatype (in this case,
T
).
> :t A
> :t B
> :t C
> :t D
Three kinds of patterns p can be used to match and destruct values of a datatype:
_
;
andFor example:
tToInt :: T -> Int
tToInt A = 0
tToInt (B i) = 1
tToInt (C b s) = 2
tToInt (D (c, s)) = 3
At run-time, a variable pattern x (of some type S) successfully matches any value v (of type S); v is then bound to x in the right-hand side of the equation.
The wildcard pattern _, like a variable, matches any value but does not introduce a name for it.
A data constructor pattern C p_1 …
p_n (of some type S) matches only those values v
(of
type S) that are constructed with C and whose n values match the
patterns p_1 through p_n, respectively. If C carries no data, then there are no
subsequent patterns (i.e. n
= 0).
Even though a datatype constructor may take multiple arguments
(e.g. C
in our example) and, thus, may be partially applied
like other functions, all data constructors must be “fully applied” to
enough patterns according to its definition.
The definition of tToInt
above defines multiple
equations — one for each data constructor — to handle all possible
argument values of type T
. Another way to destruct values
of type T
is to use a case
expression.
tToInt :: T -> Int
tToInt t =
case t of
A -> 0
B i -> 1
C b s -> 2
D (c, s) -> 3
Each branch of case
expression must return the same type
of value (in this case, Int
). The equations in the original
version of tToInt
above are syntactic sugar for this
version using case
.
A data
definition can refer to refer to one or more
(lowercase) type variables (a
, b
, etc.
below).
data T a b c ... = ...
The type variables can be referred to in the types of the data constructors.
As an example to motivate polymorphic ADTs, recall our
head
and tail
functions from before.
head :: [a] -> a
head (x:xs) = x
tail :: [a] -> [a]
tail (x:xs) = xs
By omitting equations for the empty list, these functions implicitly fail: they crash with inexhaustive pattern match errors at run-time.
Versions with more explicit failure:
head :: [a] -> a
head [] = undefined
head (x:xs) = x
tail :: [a] -> [a]
tail [] = undefined
tail (x:xs) = xs
> :t undefined
Whoa! undefined
has every type? This is only
okay because undefined
throws an exception at run-time,
halting program execution at that point.
This isn’t much better than before. Using undefined
is,
however, a cool trick in the type system that allows exceptions to be
raised anywhere in the code, while still accurately typechecking other
expressions. Using undefined
is often helpful during
program development, allowing you to put “placeholders” for expressions
that you haven’t finished yet while still allowing the rest of the
program to type check and run.
Now, a better way than crashing to represent failure.
data Maybe a
= Nothing
| Just a
deriving (Eq, Show)
Notice that Maybe
is polymorphic; it takes one type
variable a
. A value of type Maybe a
is either
Nothing
with no additional data, or Just
with
a single value of type a
.
> :t Just True
> :t Just 1
> :t Just (+)
> :t Nothing
> :t Nothing :: Maybe Bool
> :t Nothing :: Maybe (Maybe (Maybe Bool))
We can use Maybe
values to deal with “errors” much more
effectively.
maybeHead :: [a] -> Maybe a
maybeHead [] = Nothing
maybeHead (x:xs) = Just x
maybeTail :: [a] -> Maybe [a]
maybeTail [] = Nothing
maybeTail (x:xs) = Just xs
This allows (requires) callers of these functions to handle each of the two cases explicitly.
The Maybe
type is extremely useful and is defined in the
standard Prelude
. Its cousin Either
is also
quite useful.
Datatypes can also be recursive.
data List a
= Nil
| Cons a (List a)
deriving (Eq, Show)
What does this type remind you of?
The built-in lists are just like the List
type above,
but with special syntax:
[]
and (:)
, rather than names starting
with capital latters, are data constructors,[1,2,3,4]
is syntactic sugar for nested
calls to (:)
,[]
, rather than a name starting with
a capital letter, and[a]
is syntactic sugar for
[] a
.For example:
> :t []
> :t [] :: [a]
> :t [] :: [] a
> :t "abc"
> :t "abc" :: [Char]
> :t "abc" :: [] Char
> let (x:xs) = [1, 2, 3]
> let ((:) x xs) = [1, 2, 3]
data Pair a b = Pair a b deriving (Eq, Show)
data Triple a b c = Triple a b c deriving (Eq, Show)
data Tuple4 a b c d = Tuple4 a b c d deriving (Eq, Show)
Built-in pairs are just like these datatypes, again with special syntactic support.
> :t ('a', 2 :: Int)
> :t (,) 'a' 2 :: (,) Char Int
> :t (,)
> :t (,,)
> :t (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)
> let (a, b, c) = (1, 2, 3)
> let ((,,) a b c) = (1, 2, 3)
Sometimes it is useful to have a dummy value.
data Unit = Unit deriving (Eq, Show)
The built-in unit value and type are just like this datatype, again with special syntactic support.
> :t ()
> :t () :: ()
Even booleans can be (and are) defined as a datatype.
data Bool
= False
| True
deriving (Eq, Show)
If-expressions, multi-way if-expressions, and guards are syntactic
sugar for pattern matching on Bool
s.
absoluteValue n
| n >= 0 = n
| otherwise = -n
-- source file {-# LANGUAGE MultiWayIf #-}
-- ghc command-line option -XMultiWayIf
-- ghci :set -XMultiWayIf
absoluteValue n =
if | n >= 0 -> n
| otherwise -> -n
absoluteValue n =
if n >= 0 then n
else if otherwise then -n
else undefined
absoluteValue n =
case n >= 0 of
True -> n
False ->
case otherwise of
True -> -n
False -> undefined
Recall the currency example above. We were careful to use
d
s for dollar Float
s and y
s for
yen Float
s, but it’s easy to accidentally, for example, add
these two kinds of Float
s, leading to a non-sensical, and
probably hard-to-detect-and-debug, result.
One approach might be to define type aliases.
type Dollars = Float
type Yen = Float
But remember, type aliases do not create new types, just synonyms.
So, Dollars
and Yen
, which are synonymous with
Float
and with each other, can still be misused.
Another option is to define wrapper (or “box”) types with single data
constructors, for the purpose of forcing the programmer to construct and
destruct the different Float
types, with the usual support
from the typechecker to prevent errors.
data Currency
= USD Dollars -- US Dollars
| JPY Yen -- Japanese Yen
deriving (Eq, Show)
data Dollars = DollarsFloat Float deriving (Eq, Show)
data Yen = YenFloat Float deriving (Eq, Show)
Note, it is common to see definitions like the following:
data Dollars = Dollars Float deriving (Eq, Show)
data Yen = Yen Float deriving (Eq, Show)
Notice how the type and data constructor names are the same; this is not a problem, because types and expressions “live” in different places in the language syntax, so there’s never any ambiguity.
Now, we redefine add
as follows, in terms of several
helper functions that significantly reduce the chances that we mistreat
the actual Float
values.
add :: Currency -> Currency -> Currency
add (USD d1) (USD d2) = USD (addDollars d1 d2)
add (JPY y1) (JPY y2) = JPY (addYen y1 y2)
add (USD d) (JPY y) = USD (addDollars d (convertYenToDollars y))
add (JPY y) (USD d) = JPY (addYen y (convertDollarsToYen d))
addDollars :: Dollars -> Dollars -> Dollars
addDollars (Dollars d1) (Dollars d2) = Dollars (d1 + d2)
addYen :: Yen -> Yen -> Yen
addYen (Yen y1) (Yen y2) = Yen (y1 + y2)
convertYenToDollars :: Yen -> Dollars
convertYenToDollars (Yen y) = Dollars (y * dollarsPerYen)
convertDollarsToYen :: Dollars -> Yen
convertDollarsToYen (Dollars d) = Yen (d * yenPerDollar)
When data constructors carry multiple values, it can be hard to remember which components are meant to represent what (especially when multiple components have the same types). Furthermore, it can be tedious to write accessor functions that retrieve particular components carried by a data constructor.
Haskell offers an embellishment of data constructors that addresses
these two issues. Consider the following definition where the data
constructors A
and B
use record
syntax to give names to its data values.
data T'
= A'
| B' Int
| C' { foo :: Bool, baz :: String }
| D' { bar :: Char, baz :: String }
deriving (Show)
All data constructors can be used to construct values as before.
> :t A'
> :t B'
> :t C'
> :t D'
But now, record construction can also be used to create
A
and B
values.
> C' { foo = True, baz = "hello" }
> D' { bar = 'a', baz = "hello" }
> D' { baz = "hello", bar = 'a' }
In addition, accessor functions have been automatically generated.
> :t foo
> :t bar
> :t baz
Furthermore, there is a way to create a new record by copying all of its fields except for some.
> let c' = C' { foo = True, baz = "hello" }
> c' { baz = "goodbye" }
When multiple data constructors, update-copy expressions may crash:
> c' { bar = 'b' }
Should the following be an acceptable data
definition?
data Data = One { data :: Int } | Two { data :: Bool }
NamedFieldPuns
and RecordWildCards
can be handy when programming with record syntax. And punny code can
spark joy.
We will start seeing the keyword newtype
popping up in
code examples later on. This is a detail we don’t need to worry about
for now.
Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor.
In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels at run-time.
Haskell allows datatypes with exactly one constructor (such as
Dollars
and Yen
), and which carries exactly
one value, to be defined with the keyword newtype
in place
of data
, such as
newtype Box a = Box a
For the purposes of programming, newtype
is almost
exactly the same as data
. But it tells the compiler to
optimize the generated code by not including explicit Box
labels at run-time. We will get into the habit of using
newtype
whenever we define a datatype with one constructor,
without delving into the subtle differences between
using newtype
and data
.
Furthermore, we will often write expressions of the form
Box . doSomething . unbox
to unwrap (unbox
), transform (doSomething
),
and rewrap (Box
) values. So, we will use records so that
unwrapping functions are generated automatically.
newtype Box a = Box { unbox :: a }