We have implemented several functions for reading values:
read :: Read a => String -> a
Although a “parsing” function of the form…
type Parser a = String -> a
… works for simple types of data, often we’ll need to more incrementally, and non-locally, manipulate the input string in order to determine how it should be interpreted. More like a stateful function:
type Parser a = String -> (a, String)
We will furthermore often need to return multiple possible parses:
type Parser a = String -> [(a, String)]
And, as we’ve done many times before, create a new type…
data Parser a = Parser (String -> [(a, String)])
… without the run-time overhead…
newtype Parser a = Parser (String -> [(a, String)])
… and with record syntaxs to automatically generating a selector for retrieving the function inside:
newtype Parser a = Parser { runParser :: String -> [(a, String)] }
So, we’ve got a “combination” of State
and
List
. We’ll talk more about this another day.
As you’ll see in a compilers course, a traditional way to write a parser is to factor it into into lexing (which breaks a string into a stream of tokens) and parsing (which consumes tokens according to a grammar for the language). Domain-specific languages for lexers and parsers are often used to generate code that integrates with the rest of the compiler implementation (type checking, evaluation or code generation, etc).
In contrast, the approach here builds and composes parsers using the building blocks of functional programming we have been studying.
char :: Char -> Parser Char
char c = Parser $ \s -> case s of
[] -> []
(a:as) -> if c == a then [(a,as)] else []
> runParser (char 'C') "Cool"
> runParser (char 'C') "Not Cool"
Clean this up with a cute pattern called Fokker’s trick:
char c = Parser $ \s -> case s of
[] -> []
(a:as) -> [ (a,as) | c == a ]
We can make this work more generally:
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser $ \s -> case s of
[] -> []
(a:as) -> [ (a,as) | p a ]
char c = satisfy (c==)
alpha = satisfy isAlpha
digit = satisfy isDigit
space = satisfy isSpace
> runParser alpha "Hello!"
> runParser alpha "!Hello!"
string :: String -> Parser String
string str = Parser $ \s ->
let (pre, suf) = splitAt (length str) s in
if str == pre then [(pre, suf)] else []
Again, use Fokker’s trick:
string str = Parser $ \s ->
[ (pre, suf) | let (pre, suf) = splitAt (length str) s, str == pre ]
> runParser (string "Hello") "Hello!"
> runParser (string "Hello") " Hello!"
A token is a constant string that may some meaning. For
example, in Haskell, the token True
refers to one of the
data constructors of the Bool
datatype, and the token
let
marks the beginning of a let-expression.
The token
parser assigns meaning to a
String
token:
token :: String -> a -> Parser a
token str a = Parser $ \s0 ->
[ (a, s0) | (_, s1) <- runParser (string str) s ]
true = token "True" True
false = token "False" False
> runParser true "True"
> runParser true "False"
Given some existing parsers, we might like to compose them. For example, sequentially…
hello :: Parser String
hello = Parser $ \s0 ->
[ ("Hello", s5)
| (a1,s1) <- runParser (char 'H') s0
, (a2,s2) <- runParser (char 'e') s1
, (a3,s3) <- runParser (char 'l') s2
, (a4,s4) <- runParser (char 'l') s3
, (a5,s5) <- runParser (char 'o') s4
]
… or in parallel:
bool :: Parser Bool
bool = Parser $ \s0 ->
runParser true s0 ++ runParser false s0
Let’s implement Monad
first and then take the free instances of
Functor
and Applicative
.
instance Functor Parser where {fmap f x = pure f <*> x}
instance Applicative Parser where {pure = return; (<*>) = ap}
instance Monad Parser where
return :: a -> Parser a
return a = Parser $ \s0 -> [(a, s0)]
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
pa >>= f = Parser $ \s0 ->
[ (b, s2) | (a, s1) <- runParser pa s0
, (b, s2) <- runParser (f a) s1 ]
Exercise: Implement the
Functor
and Applicative
instances
directly.
We can now rewrite token
more succinctly:
token str tok =
-- fmap (\_ -> tok) (string str)
-- fmap (const tok) (string str)
const tok <$> string str
hello :: Parser String
hello = do
_ <- char 'H'
_ <- char 'e'
_ <- char 'l'
_ <- char 'l'
_ <- char 'o'
pure "Hello"
Or, because each successive parser is “independent” of the previous,
use the Applicative
interface:
hello :: Parser String
hello =
pure (\_ _ _ _ _ -> "Hello")
<*> char 'H'
<*> char 'e'
<*> char 'l'
<*> char 'l'
<*> char 'o'
The Alternative
class describes types that — in addition to being applicative functors —
have an identity element called empty
, and a binary
associative operator called (<|>)
:
class Applicative t => Alternative t where
empty :: t a
(<|>) :: t a -> t a -> t a
The Maybe
instance is defined to return the first
Just
value, if any:
> foldr (<|>) empty [Nothing, Just 2, Just 23]
Just 2
There is a useful definition of Alternative
for
parsers:
instance Alternative Parser where
empty :: Parser a
empty = Parser $ \s -> []
(<|>) :: Parser a -> Parser a -> Parser a
(Parser f) <|> (Parser g) = Parser $ \s -> f s ++ g s
bool :: Parser Bool
bool = true <|> false
> runParser bool "TrueFalse"
> runParser bool "FalseTrue"
Control.Applicative
defines several utility functions, including:
some :: Alternative t => t a -> t [a]
many :: Alternative t => t a -> t [a]
optional :: Alternative t => t a -> t (Maybe a)
We can use these to define parsers that recognize one-or-more, zero-or-more, or zero-or-one matches, respectively:
> runParser (some bool) "TrueTrue!"
> runParser (many bool) "TrueTrue!"
> runParser (optional bool) "TrueTrue!"
The definitions of some
and many
are a bit
clever, using pure
and (<|>)
. Try to
implement them, and then take a peek at the library source code.
To do in class and/or at home. (Note: Don’t worry about order of matches in the example interactions; list is modeling “non-determinism.”)
option :: a -> Parser a -> Parser a
Compared to optional
:
> runParser (optional (char 'A')) $ "ABC"
[(Nothing,"ABC"),(Just 'A',"BC")
> runParser (option 'X' (char 'A')) "ABC"
[('A',"BC"),('X',"ABC")]
(<++) :: Parser a -> Parser a -> Parser a
(<|>)
, this “left-biased choice”
should only run the right parser (and return its results) only if the
left parser fails (returns no results).munch :: (Char -> Bool) -> Parser String
Unlike (many . satisfy)
, munch
should
be “greedy”:
> runParser (many $ satisfy isAlpha) "cs223"
[("cs","223"),("c","s223"),("","cs223")]
> runParser (munch isAlpha) "cs223"
[("cs","223")]
sepBy :: Parser a -> Parser sep -> Parser [a]
sepBy1 :: Parser a -> Parser sep -> Parser [a]
For example:
> runParser (sepBy (munch isAlpha) (char ',')) "a,ab,abc"
[(["a","ab","abc"],""),(["a","ab"],",abc"),(["a"],",ab,abc"),([],"a,ab,abc")]
Hint: Implement one in terms of the other.
between :: Parser open -> Parser close -> Parser a -> Parser a
For example:
> runParser (between (char '(') (char ')') (string "Hello")) "(Hello)!"
[("Hello","!")]
skipSpaces :: Parser ()
For example:
> runParser skipSpaces " ... "
[((),"... ")]
So far, we have understood this class as follows:
class Read a where
read :: String -> a
...
But actually, read
is defined as a library function outside of the class:
class Read a where
-- readsPrec :: Int -> ReadS a
readsPrec :: Int -> String -> [(a, String)]
...
The type ReadS
is an alias for the “raw” parsing functions we have been working with.
The Int
argument pertains to precedence levels, which often
arises in practice but which we will not consider here. So, given a
parserT :: Parser T
for some data type T
, we
can implement the following:
instance Read T where
readsPrec _ = runParser parserT
Compared to read
, here is a version that deals with
errors explicitly rather than crashing:
readEither :: Read a => String -> Either String a
readEither s0 =
case reads s0 of
[(a,"")] -> Right a
[(a,s1)] -> Left $ "suffix not consumed: [" ++ s1 ++ "]"
[] -> Left $ "no successful parses"
_ -> Left $ "multiple successful parses"
See readMaybe
and readEither
in Text.Read
for similar functionality. (The function reads
is equivalent to: readsPrec 0
)
Let’s write a simple parser for a simple type of lists.
data ListInt
= Nil
| Cons Int ListInt
instance Show ListInt where
show Nil = "[]"
show (Cons n ns) = show n ++ ":" ++ show ns
instance Read ListInt where
readsPrec _ = runParser listInt
listInt :: Parser ListInt
listInt = nil <|> cons
Start with parsing empty list:
nil :: Parser ListInt
nil = do
string "[]"
pure Nil
Now parse cons:
int :: Parser Int
int = do
n <- some $ satisfy isDigit
pure $ read n
cons :: Parser ListInt
cons = do
n <- int
char ':'
ns <- listInt
pure $ Cons n ns
Okay, now let’s add sugared versions:
sweetList :: Parser ListInt
sweetList = do
char '['
ns <- int `sepBy` (char ',')
char ']'
pure $ foldr Cons Nil ns
Clean up sweetList
with between
:
sweetList = do
ns <- between (char '[') (char ']') (sepBy int (char ','))
pure $ foldr Cons Nil ns
This works on its own, but…
listInt = nil <|> cons <|> sweetList
>> read "[]" :: List
*** Exception: Prelude.read: ambiguous parse
Problem is sweetList
might match []
, but so
does nil
. So, could use sepBy1
instead:
ns <- between (char '[') (char ']') (sepBy1 int (char ','))
But, in any case, these different parsers accept disjoint sets of strings, so no need to run all.
listInt = nil <++ cons <++ sweetList
In our implementations of nil
, int
,
cons
, and sweetList
above, we used
(>>=)
(via do
-notation) to stitch
together multiple parsers. But none of those implementations actually
require the full power of the Monad
interface (i.e. the
ability of one parser to depend on the results of the previous).
nil
, int
, and sweetList
can be
written via the Functor
interface:
nil = const Nil <$> string "[]"
int = read <$> (some $ satisfy isDigit)
sweetList =
foldr Cons Nil <$>
between (char '[') (char ']') (sepBy int (char ','))
cons
cannot, because the pure function we want to apply,
a wrapper around ConsList
, takes one more than one
argument. Enter Applicative
:
cons =
(\n _ ns -> Cons n ns)
<$> int
<*> char ':'
<*> listInt
Having to create a lambda just to ignore some of the results is
clunky. Here is where the Applicative
operator
(<*)
comes in handy.
(<*>) :: Applicative t => t (a -> b) -> t a -> t b
(<*) :: Applicative t => t b -> t a -> t b
tb <* ta = pure (\b _ -> b) <*> tb <*> ta
(<*) = liftA2 const
For example, rather than writing…
(\a1 _ a3 _ a5 -> e) <$> x1 <*> x2 <*> x3 <*> x4 <*> x5
… can write
(\a1 a3 a5 -> e) <$> x1 <* x2 <*> x3 <* x4 <*> x5
Notice that sprinkling in (<*>)
“keeps” the result
of the second parser whereas sprinkling in (<*)
“ignores” the result.
May also choose to forgo (<$>)
before the first
argument so that there are only two operators to look at:
pure (\a1 a3 a5 -> e) <*> x1 <* x2 <*> x3 <* x4 <*> x5
Formatting this expression with newlines results in a nice “parser pipeline”:
pure (\a1 a3 a5 -> e)
<*> x1
<* x2
<*> x3
<* x4
<*> x5
Now, back to cons
:
cons :: ReadP List
cons =
pure Cons
<*> int
<* char ':'
<*> listInt
And we could choose to write parser pipelines for all the rest, too,
to create a similar feel (even though we don’t need
Applicative
for int
and
sweetList
):
int :: ReadP Int
int =
pure read
<*> (some $ satisfy isDigit)
cons :: ReadP List
cons =
pure Cons
<*> int
<* char ':'
<*> listInt
sweetList :: ReadP List
sweetList =
pure (foldr Cons Nil)
<*> between (char '[') (char ']') (sepBy int (char ','))
When continuing with larger Haskell developments beyond this course, take note of the following library implementation of parser combinators.
Though elegant, the functional parsing implementation — which maintains lists of potential results and performs backtracking — is not the most efficient. The Text.ParserCombinators.ReadP library offers a more efficient and full-featured implementation.
> :info ReadS -- type alias for String -> [(a,String)]
> :info ReadP
newtype ReadP a = R (forall b. (a -> P b) -> P b)
The type, involving forall b
inside a data
constructor, is an example of something called a higher-rank
type. We’re not going to talk about how this internal representation
works here. Just note that ReadP
works like
Parser
, but there are no (public) data constructors.
Small differences in names, functionality, types compared to our
Parser
. Indeed, the combinators we implemented above
emulate those from ReadP
.
Parser a |
ReadP a |
---|---|
Parser |
readS_to_P (avoid using this!) |
runParser |
readP_to_S |
char |
char |
satisfy |
satisfy |
string |
string |
token |
|
(<$>) |
(<$>) |
(<*>) |
(<*>) |
(>>=) |
(>>=) |
(<|>) (Alternative) |
(<|>) , (+++) |
many (Alternative) |
many |
some (Alternative) |
many1 |
optional (Alternative) |
optional (works differently) |
option |
option |
(<++) |
(<++) |
munch |
munch |
sepBy / sepBy1 |
sepBy / sepBy1 |
between |
between |
skipSpaces |
skipSpaces |
look (lookahead efficiently) |
|
… |
Compared to the minimal complete definition
readsPrec :: Int -> ReadS a
there is a “new-style” parser interface
readPrec :: ReadPrec a
with a ReadPrec
type that manages precedence of ReadP
parsers more
efficiently.
Have fun digging into the details!