« I write like everybody in the whole world, ever | Main | Pass the crop circle palmistride, please »

July 18, 2010

F# pattern matching for beginners, part 4: lists and recursion

We looked in part two at structural patterns and how you could use them to decompose a tuple or discriminated union, extracting the values of interest into variables on a case-by-case basis.

The limitation of tuples and discriminated unions is that their structure is fixed.  A 2-tuple (pair) is a different type from a 3-tuple (triple): you’d have to write different patterns to match them.  What if you wanted a pattern to decompose a collection that might be of arbitrary size?

List and array patterns

List and array patterns match lists and arrays respectively, with a pattern for each element of the list or array.  A list pattern looks eerily like a F# list literal, with square brackets around it and semicolons separating the element patterns:

let listingTroll list =
  match list with
  | [] -> "empty"
  | [a] -> sprintf "just %A" a
  | [a; _] -> sprintf "two elements beginning with %A" a
  | _ -> "lots of elements"
> listingTroll ["alice"; "bob"];; 
val it : string = "two elements beginning with "alice"" 
> listingTroll [1; 2; 3];; 
val it : string = "lots of elements" 
> listingTroll [1];; 
val it : string = "just 1" 
> listingTroll [];; 
val it : string = "empty"

An array pattern looks similar, except that it uses pipe characters inside the square brackets ([| … |]).

As always, we can use whatever patterns we like in each slot of the list or array pattern – wildcard, variable, literal, tuple, discriminated union, list, array, whatever.

The trouble with the list and array patterns is that they perform slot-by-slot matching, so although you can now have patterns for different numbers of slots, you still have to anticipate the number of slots at compile time.  The list and array patterns don’t let you write a pattern match to handle lists or arrays of unknown length.

The cons pattern

The oddly named cons pattern provides a way to match on the head and tail of a list – that is, the first element and the remaining elements.  Its syntax is two slots separated by a double colon: head :: tail.

(The name is for symmetry with the cons operator, which is also a double colon.  The cons operator constructs a list from a head and a tail.  For example, 1 :: [2; 3] evaluates to [1; 2; 3].)

The important thing about the cons pattern is that it doesn’t care how long the tail is, or even if the tail is empty: it just checks that there is a head element, splits that off, and deems everything else to be the tail.

Initially, this may seem a bit restrictive, because cons only lets you get at the first (head) element, and everything else remains in a list:

let consTroll list =
  match list with
  | [] -> "empty"
  | head :: _ -> sprintf "starts with %A" head

This may seem unhelpful, because if we know the size of the tail list in advance, then we haven’t really gained anything by using the cons pattern, and if we don’t know the size of the tail list in advance, the only way we can work with it is by using the cons pattern again.

Wait.  Can you hear singing?

Enter recursion

If you’ve been programming for longer than about three minutes, you’ll be familiar with recursion.  The essence of recursion is that you reduce a problem to a similar but simpler or smaller problem, which is then either trivial or susceptible to the same kind of reduction.

What does this have to do with list processing using the cons pattern?  The cons pattern decomposes a list into a single element and a shorter list.  And a shorter list is a smaller problem; it might even be so short that it’s empty, which gives you a trivial problem that can be handled by the fixed-length list pattern.  So the cons pattern enables you to pattern match recursively, splitting off one head element at each stage.  Combined with the trivial case of the empty list, this allows you to use pattern matching to create recursive functions that work on lists of unknown length.

This may sound a bit theoretical, so let’s look at a simple example.

let rec length list =
  match list with
  | [] -> 0
  | _ :: tail -> 1 + length tail

You can guess what this does.  Yes, you can.

> length [1; 2; 4; 8];; 
val it : int = 4 

But how does it do it?

Let’s dispose quickly of a tangential but important detail.  The length function is declared not using let, but let rec.  F# requires the rec wart when a function calls itself: if you don’t provide it, F# will complain on the last line that the length function isn’t defined yet.

Now, on to the pattern matching.  The length function defines two cases.

The first case is a list pattern.  You can recognise this by the square brackets.  Since there are no other patterns inside the square brackets, this will match only the empty list.  And an empty list surely has a length of 0.

The second case is a cons pattern.  This will match any list with a head that matches the wildcard pattern – that is, any head at all – and any tail.  In effect, the second case matches any non-empty list.  (Remember, the cons pattern allows an empty tail.  So even if the list has only one element, the cons pattern still matches.)  But the second case captures the tail of the list – that is, everything except the head – into the variable tail, and calls the length function on that.  In the second case, the length of the list is the length of the head, which is always 1, plus the length of the tail, whatever that might be.

This is an obvious case of recursion.  The length function either reduces its argument to a simpler case – a shorter list – or detects a trivial case and returns immediately – the empty list.

Here are a couple of less trivial examples:

let rec doAll f list =
  match list with
  | [] -> []
  | head :: tail -> f head :: doAll f tail
> doAll (fun n -> n * n) [1; 2; 3];; 
val it : int list = [1; 4; 9] 
let rec aggregate f seed list =
  match list with
  | [] -> seed
  | head :: tail -> aggregate f (f seed head) tail
> aggregate (fun m n -> m + n) 0 [1; 2; 3; 4];; 
val it : int = 10 

The doAll and aggregate functions take a function and a list and apply that function across the list.  You’ll recognise them as the LINQ Select and Aggregate methods.  They work by matching the cons pattern to recurse from a list to a shorter list, and bottoming out the recursion by matching the empty list.  They really differ only in the way that they combine intermediate results to make the final one and in the base case result.  For example, doAll combines the intermediate results by consing them up into a new list, and its base case is the empty list (because applying a function to all members of an empty list would surely result in an equally empty list).

So doAll does its work like this:

  • doAll square [2; 3]: cons pattern matches with head = 2 and tail = [3], so the value will be 4 :: doAll square [3].  (4 is square 2.)
    • doAll square [3]: cons pattern matches with head = 3 and tail = [], so the value will be 9 :: doAll square [].
      • doAll square []: empty list pattern matches, so the value is [].
    • 9 :: [] evaluates to [9].


  • 4 :: [9] evaluates to [4; 9] and this is the result of the entire match expression.

aggregate does its work in a similar way, but returning the seed value in the empty case, and combining results using the aggregation function instead of by consing.

So the cons pattern allows us to recursively decompose a list, and use the result expression to recursively stitch the decomposed list back together again.

What I have just told you is wrong

I’ve tried to present the use of the cons pattern and recursion in as simple a way as possible.  However, naive recursion doesn’t work with very long lists, because the stack overflows.  In reality, therefore, F# prefers to transform recursive expressions so it can use tail recursion, which is basically a kind of recursion that doesn’t require stack frames.  Unfortunately, this transformation makes the code rather more complicated, and distracts from the pattern matching side of things, so I’ve chosen to skip it for now.  If you’re writing F# code to operate on moderately large data sets, though, it’s something you’ll need to know about.

Enough with these new-fangled lists.  What about arrays of unknown length?

You are stuffed.  But on the positive side, I hear Visual Basic has really good support for arrays.  Check it out!

July 18, 2010 in Software | Permalink

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c5c9b53ef01348580d4dc970c

Listed below are links to weblogs that reference F# pattern matching for beginners, part 4: lists and recursion:

Comments

Haha... it's not often that an attempted joke on a technical blog really makes me LOL, but this was good... :D
---------------------------------

What about arrays of unknown length?

You are stuffed. But on the positive side, I hear Visual Basic has really good support for arrays. Check it out!


Posted by: Javaman59 at Sep 2, 2010 1:58:56 PM

Finally someone who explains the x::list pattern. Very informative article

Posted by: Chris at Feb 15, 2011 4:41:00 AM

Finally someone who explains the x::list pattern. Very informative article

Posted by: Chris at Feb 15, 2011 4:41:00 AM