« It has been one of those days | Main | Keeping up with the parking meters »

July 02, 2011

Recursive parsers in FParsec

Probably world + dog knows this already, but it didn’t jump out at me in the FParsec documentation or in Google, so I thought I’d write it down anyway.

The problem

Many languages contain recursive structures.  A list in Scheme or Lisp can contain sub-lists; a JSON object can contain sub-objects.

In FParsec, a parser for a linguistic structure is represented by a variable of parser type.  Here’s a simple example:

let number = many1 digit |>> Number

(FParsec does include built-in parsers for common number types – I just wanted a simple example.  And the Number constructor, like List in the next fragment, is just a constructor for a hypothetical AST node.)

Now consider a parser for a Scheme list of numbers – that is, a list of numbers enclosed in brackets and separated by spaces.  This is a bit more complicated:

let sepOrEnd = (spaces1 <|> (spaces >>. followedByString ")"))
let listElement = parseExpr .>> sepOrEnd
let series = spaces >>. many listElement |>> List
let list = between (pchar '(') (pchar ')') series

(I am sure there are more elegant ways to do this, but I don’t want to get hung up on that right now I’m trying to avoid line breaks.)

And we’ll say that an expression can be either a number or a list:

let expr = choice [ number; list ]

In reality, of course, a list should be able to contain any kind of expression.  So we need to redefine listElement as follows:

let listElement = expr .>> sepOrEnd

But now disaster strikes.  F# refuses to compile listElement, complaining that expr isn’t defined.  And there’s no way to move things around to avoid this.  expr depends on list, which depends on series, which depends on listElement, which depends on expr.

You might think we could get out of this with a let rec, making the relevant parsers mutually recursive, but let rec only works for recursive functions.  FParsec parsers are values.  Think of it as if you were trying to define four C# member or local variables.  C# is happy to let you define four methods in terms of each other, but not for you initialise to initialise four variables in terms of each other.

The solution

To get around this, FParsec provides you with a way to effectively forward declare a parser.  If I could ‘declare’ the expr parser before the listElement parser, so that listElement knew about it, but defer the implementation until after the list parser, then that would break the mutual dependency.  F# doesn’t have the notion of forward declaration, but FParsec simulates it using a function.

To forward declare a parser, call the createParserForwardedToRef function.  This function returns a pair.  The first element of the pair is a parser you can use without having defined it.  It’s the forward declaration, if you like.  The second element of the pair is a reference cell you can later overwrite with the implementation of the parser.  FParsec takes care of hooking up the ‘forward declaration’ parser to the eventual implementation.

This sounds more confusing than it actually is, so let’s see it in action.

let expr, exprImpl = createParserForwardedToRef()

Now expr is a parser which is hooked up to exprImpl, and exprImpl is a reference which we can overwrite later.  Because expr is a proper honest-to-goodness variable as far as F# is concerned, and is now defined before listElement, the compiler is now happy for us to use it in listElement:

let listElement = expr .>> sepOrEnd

Of course, now we can no longer write let expr = … when we want to implement expr parser, because expr is already declared.  And this is where exprImpl comes in.  Because this is a reference, we can overwrite its contents with the parser we actually need:

do exprImpl := choice [ number ; list ]

Note the use of do and := instead of let and =.  We are not rebinding exprImpl – it’s not mutable so F# wouldn’t allow that – but because it’s a reference cell, we can overwrite its contents.

Because of the way FParsec set up the expr ‘forward declaration’ – in reality, because of the implementation FParsec generated for the expr parser – all calls to expr actually end up going to exprImpl.  As a result, when exprImpl invokes expr (indirectly through listElement), it ends up invoking itself.  And we have a recursive parser, just as we needed.

Final code

let expr, exprImpl = createParserForwardedToRef()
 
let number = many1 digit |>> Number
 
let sepOrEnd = (spaces1 <|> (spaces >>. followedByString ")"))
let listElement = expr .>> sepOrEnd
let series = spaces >>. many listElement |>> List
let list = between (pchar '(') (pchar ')') series
 
do exprImpl := choice [ number; list ]

July 2, 2011 in Software | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c5c9b53ef014e898886fe970d

Listed below are links to weblogs that reference Recursive parsers in FParsec:

Comments

That's a very nice tutorial for defining recursive FParsec parsers with createParserForwardedToRef!

Thanks a lot!

Posted by: Stephan at Jul 2, 2011 8:25:11 PM