« Action, Func, void and unit | Main | Discriminated unions and interfaces »

February 25, 2012

Discriminated unions in F#

F# has lots of different ways of expressing composite data: tuples, records, classes, etc.  One way that doesn’t have a direct equivalent in C# or Visual Basic is the discriminated union.

A discriminated union is used when there are different ‘cases’ of the data type, potentially with different kinds of data depending on the case.  That probably sounds rather abstract, so let’s see a simple example:

type Variant =
       | Numeric of int
       | Text of string
       | Empty

There are three ‘cases’ of the Variant type: the Numeric case, which carries an integer value; the Text case, which carries a string value; and the Empty case, which doesn’t carry any additional data.

We can create instances of the Variant type using the ‘constructors’ corresponding to the cases:

> let one = Numeric 1;;
val one : Variant = Numeric 1
> let hello = Text "hello";;
val hello : Variant = Text "hello"
> let empty = Empty;;
val empty : Variant = Empty

If a case carries additional data, then we pass that additional data to the constructor.  If it doesn’t, then the case is effectively a singleton instance, so we’re really just doing a variable assignment.

To process the Variant type, we can use pattern matching:

let print v =
    match v with
    | Numeric n -> printfn "Num %d" n
    | Text s    -> printfn "Txt %s" s
    | Empty     -> printfn "Empty"

Now we’ve seen discriminated unions at work, let’s see how they compare to that ubiquitous C# data structure, the class.

In C#, Variant would be an abstract base class, with three subclasses, Numeric, Text and Empty.  Numeric would have an integer field, Text would have a string field, and Empty would have no fields.

This already points up one important difference between discriminated unions and class hierarchies: discriminated unions are closed.  Users of the F# Variant type can’t create a new ‘case,’ whereas users of the C# Variant type can create a new derived class.  So discriminated unions are good for scenarios where you want to control the set of permitted cases; if you want the type to be extensible, you need to go to a class.

Ignoring this detail, here’s a naive stab at implementing the Variant type in C#:

public abstract class Variant
{
}

public class Numeric : Variant
{
   public int Value { get; set; }
}

// similarly for Text and Empty subclasses

However, this implementation has quite different behaviour from the F# discriminated union.  Discriminated unions have value semantics: that is, an instance of Variant is a value just like an instance of Int32 is a value.  Two Variants that are both Numeric 99 have the same value, and so should be considered equal; furthermore, Numeric 100 is a different value from Numeric 99, and must therefore be a different instance.  The first observation means that the Variant types need to implement custom equality; the second means they have to be immutable.  (The immutability issue may seem a bit subtle, but the reasons for wanting immutability go deep; maybe a subject for a future post.)  So our naive implementation needs quite a bit of fixing:

public class Variant : IEquatable<Variant>
{
   public override Equals(object other) { … }
   public override Equals(Variant other) { /* don’t ask */ }
   public override int GetHashCode() { … }
}

public class Numeric : Variant
{
   private readonly int _value;

   public Numeric(int value)
   {
     _value = value;
   }

   public int Value
   {
     get { return _value; }
   }
}

// similarly for Text subclass

This points up a second important difference between discriminated unions and classes: discriminated unions are much easier to write!  The F# Variant type was entirely defined and implemented in four lines.  In the C# Variant class, it takes four lines just to implement a single attribute of a single subclass (one for the field, one for the constructor assignment, one for the property declaration and one for the property getter implementation)!  And good luck writing a bug-free Equals method on your first go (and maintaining it as you add new classes).  Discriminated unions give you value semantics for free; class hierarchies make you work for it.

On the other hand, the fact that C# makes you write all that code out by hand does give you more control over the behaviour of your type.  If you want to ensure that numeric variants are always positive, you can add a check in the Numeric constructor.  You can’t do that with a discriminated union.  A discriminated union is a dumb container of values; a class is a smart container of business logic.

Anyway, now we’ve got the Variant type implemented, it’s time to do some work with it.  But thinking about implementing the ‘print variant’ function above, there are two very different ways we could tackle this.  The idiomatic approach in C# is to make Print a virtual method on the Variant base class and override it in the derived classes:

public abstract class Variant
{
   public abstract void Print();
}

public class Numeric : Variant
{

   public override void Print()
   {
     Console.WriteLine("Num " + Value);
   }
}

// Similarly for Text and Empty subclasses

This works fine as long as you’re the one defining the class hierarchy.  (It’s a bit verbose, because it repeats the ‘public override void Print()’ signature in every derived class.  But it does work.)  Unfortunately, if you have to add a function to a class hierarchy that somebody else has defined, then things get very dreary:

public static void Print(Variant v)
{
   Numeric n = v as Numeric;
   if (n != null)
   {
     Console.WriteLine("Num " + n.Value);
     return;
   }

   // similarly for Text and Empty subclasses
}

With a discriminated union, you don’t get the choice: there’s no way to define a virtual method and override it on individual cases, so you always have to take the separate function approach.  But thanks to pattern matching, implementing a function across the cases of a discriminated union in F# doesn’t incur all the intrusive logic of the C# version, and it’s actually more concise than even the C# virtual method version.

To sum up, discriminated unions provide a more compact and efficient way than full-fat classes of representing simple heterogeneous data.  You can implement a discriminated union in far less code than a class, and the code to work with a discriminated union is also more concise than the class equivalent.  However, discriminated unions are suitable only when the set of possible cases is known in advance – they aren’t openly extensible the way a class hierarchy is – and they don’t provide a way to encapsulate behaviour within the type.  For example, you would never make the Control hierarchy a discriminated union, because users need to be able to define their own range of controls, and the new control types need to carry their own behaviour (e.g. rendering) within them.  On the other hand, for types like the F# Option type (used for ‘either a value of x, or no value’), the Haskell Either type (traditionally used to express ‘either a result of x, or a failure reason of y’), or even much wider unions like LINQ expression types or node types in an Abstract Syntax Tree, discriminated unions are an ideal choice.

February 25, 2012 in Software | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c5c9b53ef016762ec1c93970b

Listed below are links to weblogs that reference Discriminated unions in F#:

Comments

Very good introductory article. It explains the concept in a very straight forward manner.

Just a comment (because I think it is an important note) that you can add methods to discriminated unions (in the same way you can add methods to C# classes), allowing them to be a little better than simple value containers:

type X = | Foo | Bar
with member x.isFoo =
match x with | Foo -> true | Bar -> false

Posted by: Chris at Feb 25, 2012 1:16:43 PM

Yes and no. I agree it's important and relevant that you can add methods to DUs, but those methods still need to be implemented 'visitor style.' You can't write different implementations on different cases a la C# virtual members: you need to do the pattern matching even though you're writing it as a member method rather than a module-level function. So in practice I feel the fact that it's a member method rather than a function doesn't make a huge amount of difference. And I feel it's more idiomatic to use functions because they play more nicely with higher-level functions (e.g. you'd prefer to write "List.iter print vts" than "List.iter (fun vt -> vt.print()) vts").

But yes, you're right: it is possible to bundle up 'behaviour' in the DU type via member methods and I should have mentioned it. Thanks for bringing it up!

Posted by: Ivan at Feb 25, 2012 1:37:59 PM