Union types and covariance, or why we need intersections

ORM Search Validator Reactive Repositories Others

A couple of days ago I spotted an interesting hole in Ceylon's type checker. I started out with something like this:

Sequence<Float>|Sequence<Integer> numbers = .... ;
Sequence<Numbers> numbers2 = numbers;

The Sequence interface is covariant in its type parameter. So, since Float and Integer are both subtypes of Number, both Sequence<Float> and Sequence<Integer> are subtypes of Sequence<Number>. Then the compiler correctly reasons that the union of the two types is also a subtype of Sequence<Number>. Fine. Clever compiler.

Now, here's the hole:

value first = numbers.first; //compiler error: member does not exist
value first2 = numbers2.first; //ok, infers type Number

When it encountered the union type, the member resolution algorithm was looking for a common produced type of the type constructor Sequence in the hierarchies of each member of the union type. But since Sequence<Float> isn't a subtype of Sequence<Integer> and Sequence<Integer> isn't a subtype of Sequence<Float>, it simply wasn't finding a common supertype. This resulted in the totally counterintuitive (and, in my view, pathological) result that the member resolution algorithm could not assign a type to the member first of the type Sequence<Float>|Sequence<Integer>, but it could assign a type after widening to the supertype Sequence<Number>.

Of course, there might be many potential common supertypes of a union type. There's no justification for the member resolution algorithm to pick Sequence<Number> in preference to a sequence of any other common supertype of Float and Integer. We've got an ambiguity.

So I quickly realized that I had an example which was breaking two of the basic principles of Ceylon's type system:

It is possible to assign a unique type to any expression without cheating and looking at where the expression appears and how it is used.
All types used or inferred internally by the compiler are denoteable. That is, they can all be expressed within the language itself.

These principles are specific to Ceylon, and other languages with generics and type inference don't follow them. But failing to adhere to them - and especially to the second principle - results in extremely confusing error messages (Java's horrid capture-of type errors, for example). These two principles are a major reason why I say that Ceylon's type system is simpler than some other languages with sophisticated static type systems: when you get a typing error, we promise you that it's an error that humans can understand.

Fortunately I was able to discover a useful relationship between union types and covariance.

If T is covariant in its type parameter, then T|T<V> is a subtype of T<U|V> for any types U and V.

But furthermore, T<U|V> is also a subtype of any common supertype of T and T<V> produced from the type constructor T. We've successfully eliminated the ambiguity!

So I adjusted the member resolution algorithm to make use of this relationship. Now the problematic code compiles correctly, and infers the correct type for first:

value first = numbers.first; //ok: infers type Float|Integer

Well, that's great! Oh, but what about types with contravariant type parameters? What type should be inferred for the parameter of the consume() method of Consumer<Float>|Consumer<Integer>? Well, I quickly realized that the corresponding relationship for contravariant type parameters is this one:

If T is contravariant in its type parameter, then T|T<V> is a subtype of T<U&V> for any types U and V.

Where U&V is the intersection of the two types. So the type of the parameter of consume() would be Float&Integer, which is intuitively correct. (Of course, since it is impossible for any object to be assignable to both Float and Integer, the compiler could go even further and reduce this type to the bottom type.)

But, ooops, Ceylon doesn't yet have first-class intersection types, except as a todo in the language specification. And our second principle states that the compiler isn't allowed to infer or even think about types which can't be expressed in the language!

Well, really, I was just waiting for the excuse to justify introducing intersection types, and this gave me the ammunition I was waiting for. So yesterday I found a couple of free hours to implement experimental support for intersection types in the typechecker, and, hey, it turned out to be much easier than I expected. It's also a practically useful feature. I've often wanted to write a method which accepts any value which is assignable to two different types, without introducing a new type just to represent the intersection of the types.

I'll leave you with two more interesting relationships, applying to intersection types: