« ASP.NET Dynamic Data and custom sources | Main | Somebody set us up the brain »

December 27, 2008

C# covariance and contravariance by example

One of the new features of C# 4.0 is generic covariance and contravariance.

Admit it: your heart rate just went up.  This is about the most intimidating-sounding, beardy-academic feature to come into C# since they stopped calling it "C Octothorpe."  (Which Windows Live Writer wants me to change to "C Clodhopper."  Unusual dictionary you have there, chaps.)  You feel that at any moment Philip Wadler is going to spring out of a bush and explain to you that it refers to contravariant functors on the poset category of types.  And then he'll make you sit an exam on it.  And then you'll realise that you came out without any trousers on and you're being hunted for sport by a jar of marmalade.

In fact, covariance and contravariance are scary terms for a very simple and familiar concept.

Imagine you have the following function:

Apply(Transform t) { ... }

When you call this function, you don't have to pass a Transform.  You can also pass a reference of any type derived from Transform -- say, RotateTransform or ScaleTransform:

Transform t = new Transform();
Apply(t);  // okay
RotateTransform rt = new RotateTransform();
Apply(rt);  // also okay

Why is this okay?  To the C# compiler, it's okay because RotateTransform derives from Transform.  But conceptually, the reason it's okay is that if Apply can deal with any arbitrary Transform, it can certainly deal with a RotateTransform.

But now suppose Apply looks like this:

Apply(IEnumerable<Transform> ts) { ... }

By the same logic, when you call this function, you don't have to pass something typed as IEnumerable<Transform>.  The C# compiler knows it's okay to pass anything that derives from (implements) IEnumerable<Transform> -- say, List<Transform> or ReadOnlyCollection<Transform>.

But after this point, what you should conceptually be able to do diverges from what the C# compiler will let you do.  Conceptually, you should also be able to pass an IEnumerable<RotateTransform>:

IEnumerable<RotateTransform> rts = new List<RotateTransform>();
Apply(rts);

After all, if Apply can deal with a sequence of arbitrary Transforms, it can surely deal with a sequence of RotateTransforms, right?

Right.

But the C# compiler, in C# 3 and earlier, doesn't see it that way.  Although you and I can work out that IEnumerable<RotateTransform> is compatible with IEnumerable<Transform>, it doesn't derive from IEnumerable<Transform>.  So the C# compiler goes into fits of CS1503 errors and flounces off in a huff.

In summary, the C# compiler understands that it's okay to vary the type of an argument, but doesn't understand that it's okay to vary the type of a generic type parameter.  And generic variance just means fixing that: teaching the compiler that it is okay to pass an IEnumerable<RotateTransform> to a function that expects an IEnumerable<Transform>.

Hang on, though.  The feature is called covariance and contravariance.  Why do we need two names for this stuff?  Isn't it just inheritance?  Let's look in more detail.

When you vary the type of an argument, you can only vary it in the direction of more derived.  Passing a RotateTransform to a function that expects a Transform is okay.  Passing an Object to a function that expects a Transform is not okay.  This is probably so ingrained you don't even have to think about it.

When you vary the type of the generic type parameter to IEnumerable<Transform>, the same rule applies.  A function that can deal with a sequence of Transforms can deal with a sequence of RotateTransforms, but not with a sequence of arbitrary objects.  Same rule, right?

Right.  But only because of a specific characteristic of IEnumerable<T>.  In other cases, it turns out that the rule has to be the other way round: you can only vary the type parameter in the direction of less derived.

For example, suppose we have a function that takes an IComparer<T>:

Compare(IComparer<Transform> c) { ... }

If we use the IEnumerable rule and call this function with an IComparer<RotateTransform>, we have a problem.  The function expects to be able to use the IComparer to compare arbitrary Transform objects.  If it decides to compare a ScaleTransform and a TranslateTransform, our IComparer<RotateTransform> will be dreadfully embarrassed.  We can't use derived types after all.

On the other hand, suppose we call this function with an IComparer<object>.  How will it cope?  Very nicely, thank you.  IComparer<object> can compare arbitrary objects, so it can easily cope with the specific requirement of comparing Transforms.  So we can pass a base type in place of the expected type.

So sometimes the rule is that we can only vary in the direction of more derived, and sometimes the rule is that we can only vary in the direction of less derived (more base).  How do we -- and the C# compiler -- know which rule applies in any given case?

The answer -- simplifying somewhat -- is that it depends on whether the generic type parameter appears in output or input positions.

In IEnumerable<Transform>, Transform appears in an output position.  (It appears in the return value of GetEnumerator.)  Now if Transform appears in an output position, then any user of the generic type -- such as the Apply method -- expects to be receiving Transforms, and knows how to deal with them.  So it can certainly deal with derived types such as RotateTransform.  Moreover, Transform appears only in an output position.  So the Apply method can't bust out a ScaleTransform and try to get our RotateTransform-specific implementation to accept it: there's no in-parameter through which Apply can try to feed us the ScaleTransform.

In IComparer<Transform>, Transform appears in input positions.  (It appears as the inputs to the Compare method.)  Now if Transform appears in an input position, then a user of the generic type -- such as the Compare method -- is going to give us Transforms, and expect us to deal with them.  So we need to be able to deal with Transforms at least, but if we can deal with more things -- i.e. a base type -- then that's not going to do any harm.  Moreover, Transform appears only in an input position.  So the IComparer implementation can't return a base type instance to its user: there's no out-parameter or return value through which we could sneak out an Object which the Compare method wouldn't be expecting.

So the rule is: if a type parameter appears only in an output position, you can vary it in the more derived direction, and if a type parameter appears only in an input position, you can vary it in the less derived (base type) direction.

And in fact this is the terminology that gets used in the C# 4 language.  The interfaces we've been discussing would now have the following signatures:

IEnumerable<out T> { ... }
IComparer<in T> { ... }

The annotations tell the compiler that the annotated parameter appears only in output or input positions, as appropriate.  When defining the generic type, the compiler verifies that this is indeed the case.  When performing type checking, the compiler allows variance of the type parameter up or down the class hierarchy according to how the parameter is annotated.

Note, incidentally, that these annotations are per type parameter, and different parameters can have different annotations.  For example, in Converter<TInput, TOutput>, TInput appears only in input positions, and TOutput-- well, you can guess where TOutput appears.  So suppose we have a method like this:

Parse(Converter<string, Transform> c) { ... }

We could give it a Converter<object, Transform> because a converter that can cope with arbitrary objects will eat strings for breakfast.  We could give it a Converter<string, RotateTransform> because if the Parse method is braced to get an arbitrary Transform back then it will be perfectly happy to get a RotateTransform.  And of course we can vary both type parameters and give it a Converter<object, RotateTransform>.

Note also that if a type parameter appears in both input and output positions, it can't be varied in either direction.  Consider a method that takes an IList<Transform>:

MysteryFunc(IList<Transform> ts) { ... }

Can we safely pass it an IList<object>?  No, because its implementation might look like this:

MysteryFunc(IList<Transform> ts)
{
  Transform t = ts[0];
}

If ts is a List<object> and ts[0] happens to be a Llama, then MysteryFunc will rapidly get a lot more mysterious.  So how about the other direction?  Can we safely pass it an IList<RotateTransform>?  Once again the answer is no:

MysteryFunc(IList<Transform> ts)
{
  ts.Add(new ScaleTransform());
}

For what it's worth, if you use arrays, C# does allow you to vary the array type in the "more derived" direction, for example passing a string[] where an object[] is expected.  As you now know, this would be okay if the array element type appeared only in output positions, but this is not the case with arrays, which leads to trouble:

PimpMyArray(object[] objs)
{
  objs[0] = new Llama();
}

PimpMyArray(new string[1]);

The compiler is happy.  The program, and the Llama, experience the ignominious fate of an ArrayTypeMismatchException.

Very well, one last detail.  I remarked that Transform appeared only in an output position of IEnumerable<Transform>.  Conceptually, that's true: if you have an IEnumerable<Transform>, you can only get Transforms out, you can't put them in.  Syntactically, it's a complete lie.  What appears in an output position is actually IEnumerator<Transform>.  The reason I was able to get away with this cheat is that, in IEnumerator<Transform>, Transform also appears only in output positions.  (It's the type of the Current property, and Current is read-only.)  An output of an output is going to be an output.  If we had an IComparerFactory<T> interface where an IComparer<T> appeared as an output, bearing in mind that in IComparer T appears as an input, things would get messier.  At this point my head for one starts spinning and I enter a bizarre world where I briefly think I understand Haskell monads, and then that damn jar of marmalade comes after me again.  See Eric Lippert's helpfully named article Higher Order Functions Hurt My Brain if you want to know what happens in this kind of situation.

The scary terms?  If you care, covariance refers to the output case, and contravariance to the input case.  My mnemonic for these is that covariance goes in the same direction as normal instance argument type variance, and contravariance goes in the opposite ("contra") direction.  Eric Lippert explains the formal terminology and the type-theoretical underpinnings in part one of his eleven-part (so far) series.

But from the strictly pragmatic point of view, all you need to know is that "covariance and contravariance" means that you can now pass inexact generic types when it's safe to do so, just as you can pass inexact argument types when it's safe to do so.  And that's not too scary at all.

December 27, 2008 in Software | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c5c9b53ef01053695fa34970b

Listed below are links to weblogs that reference C# covariance and contravariance by example:

Comments

Great post Ivan.... deserves to be read far and wide...

I got burnt by a lack of generic type variance while working with some ASP.NET MVC stuff a few weeks ago...

My 'lovely' architecture fell to bits... Looking forwrd to .NET 4.0 :-)

Posted by: Chris Auld at Dec 30, 2008 10:32:40 AM

Nice post! And thanks for the links.

The only clarifying point I would add to your excellent post is that generic variance only works if there are reference type conversions on the varying type arguments. That is, you can convert from IE to IE only if Foo goes to Bar via a built-in reference type conversion. User-defined conversions, boxing conversions, unboxing conversions, and so on, cannot be used in variance, so IE does not go to IE even though int goes to object via a boxing conversion.

The reason for this is that a variant conversion is nothing more than a "reinterpretation" of the bits that are in memory. All those other kinds of conversions actually transform the bit pattern in memory, from a 32 bit int to possibly a 64 bit handle to a boxed int, say. Variance only works because we can enforce the condition that we are merely telling the CLR to reinterpret existing memory, not creating new data.

Posted by: Eric Lippert at Jan 27, 2009 4:53:10 AM

Nice post! And thanks for the links.

The only clarifying point I would add to your excellent post is that generic variance only works if there are reference type conversions on the varying type arguments. That is, you can convert from IEFoo to IEBar only if Foo goes to Bar via a built-in reference type conversion. User-defined conversions, boxing conversions, unboxing conversions, and so on, cannot be used in variance, so IEint does not go to IEobject even though int goes to object via a boxing conversion.

The reason for this is that a variant conversion is nothing more than a "reinterpretation" of the bits that are in memory. All those other kinds of conversions actually transform the bit pattern in memory, from a 32 bit int to possibly a 64 bit handle to a boxed int, say. Variance only works because we can enforce the condition that we are merely telling the CLR to reinterpret existing memory, not creating new data.

Posted by: Eric Lippert at Jan 27, 2009 4:54:32 AM

Nice post! And thanks for the links.

The only clarifying point I would add to your excellent post is that generic variance only works if there are reference type conversions on the varying type arguments. That is, you can convert from IEFoo to IEBar only if Foo goes to Bar via a built-in reference type conversion. User-defined conversions, boxing conversions, unboxing conversions, and so on, cannot be used in variance, so IEint does not go to IEobject even though int goes to object via a boxing conversion.

The reason for this is that a variant conversion is nothing more than a "reinterpretation" of the bits that are in memory. All those other kinds of conversions actually transform the bit pattern in memory, from a 32 bit int to possibly a 64 bit handle to a boxed int, say. Variance only works because we can enforce the condition that we are merely telling the CLR to reinterpret existing memory, not creating new data.

Posted by: Eric Lippert at Jan 27, 2009 4:54:46 AM

Nice post! And thanks for the links.

The only clarifying point I would add to your excellent post is that generic variance only works if there are reference type conversions on the varying type arguments. That is, you can convert from IEFoo to IEBar only if Foo goes to Bar via a built-in reference type conversion. User-defined conversions, boxing conversions, unboxing conversions, and so on, cannot be used in variance, so IEint does not go to IEobject even though int goes to object via a boxing conversion.

The reason for this is that a variant conversion is nothing more than a "reinterpretation" of the bits that are in memory. All those other kinds of conversions actually transform the bit pattern in memory, from a 32 bit int to possibly a 64 bit handle to a boxed int, say. Variance only works because we can enforce the condition that we are merely telling the CLR to reinterpret existing memory, not creating new data.

Posted by: Eric Lippert at Jan 27, 2009 4:55:06 AM

nice! i'm gonna make my own blog

Posted by: arikips at May 23, 2009 5:45:33 PM

you suck.

Posted by: poo at Jun 9, 2010 2:30:12 AM

Thanks, this really helped!

Posted by: Punit at Aug 18, 2011 10:35:59 AM

Thank you so much! I was reading into C# generics variance since hours and somehow couldn't get into my brain why contravariance is needed (sometimes you just "stand on the pipe" as germans say) - after reading your article I've finally got it. So simple.. You saved my day :)

Posted by: Maverick at Sep 19, 2012 3:21:13 AM

Covariance and contravariance does not support value type

Posted by: .Net Training at Jan 17, 2013 8:43:58 PM