Help

A couple of days ago I spotted an interesting hole in Ceylon's type checker. I started out with something like this:

Sequence<Float>|Sequence<Integer> numbers = .... ;
Sequence<Numbers> numbers2 = numbers;

The Sequence interface is covariant in its type parameter. So, since Float and Integer are both subtypes of Number, both Sequence<Float> and Sequence<Integer> are subtypes of Sequence<Number>. Then the compiler correctly reasons that the union of the two types is also a subtype of Sequence<Number>. Fine. Clever compiler.

Now, here's the hole:

value first = numbers.first; //compiler error: member does not exist
value first2 = numbers2.first; //ok, infers type Number

When it encountered the union type, the member resolution algorithm was looking for a common produced type of the type constructor Sequence in the hierarchies of each member of the union type. But since Sequence<Float> isn't a subtype of Sequence<Integer> and Sequence<Integer> isn't a subtype of Sequence<Float>, it simply wasn't finding a common supertype. This resulted in the totally counterintuitive (and, in my view, pathological) result that the member resolution algorithm could not assign a type to the member first of the type Sequence<Float>|Sequence<Integer>, but it could assign a type after widening to the supertype Sequence<Number>.

Of course, there might be many potential common supertypes of a union type. There's no justification for the member resolution algorithm to pick Sequence<Number> in preference to a sequence of any other common supertype of Float and Integer. We've got an ambiguity.

So I quickly realized that I had an example which was breaking two of the basic principles of Ceylon's type system:

  • It is possible to assign a unique type to any expression without cheating and looking at where the expression appears and how it is used.
  • All types used or inferred internally by the compiler are denoteable. That is, they can all be expressed within the language itself.

These principles are specific to Ceylon, and other languages with generics and type inference don't follow them. But failing to adhere to them - and especially to the second principle - results in extremely confusing error messages (Java's horrid capture-of type errors, for example). These two principles are a major reason why I say that Ceylon's type system is simpler than some other languages with sophisticated static type systems: when you get a typing error, we promise you that it's an error that humans can understand.

Fortunately I was able to discover a useful relationship between union types and covariance.

If T is covariant in its type parameter, then T<U>|T<V> is a subtype of T<U|V> for any types U and V.

But furthermore, T<U|V> is also a subtype of any common supertype of T<U> and T<V> produced from the type constructor T. We've successfully eliminated the ambiguity!

So I adjusted the member resolution algorithm to make use of this relationship. Now the problematic code compiles correctly, and infers the correct type for first:

value first = numbers.first; //ok: infers type Float|Integer

Well, that's great! Oh, but what about types with contravariant type parameters? What type should be inferred for the parameter of the consume() method of Consumer<Float>|Consumer<Integer>? Well, I quickly realized that the corresponding relationship for contravariant type parameters is this one:

If T is contravariant in its type parameter, then T<U>|T<V> is a subtype of T<U&V> for any types U and V.

Where U&V is the intersection of the two types. So the type of the parameter of consume() would be Float&Integer, which is intuitively correct. (Of course, since it is impossible for any object to be assignable to both Float and Integer, the compiler could go even further and reduce this type to the bottom type.)

But, ooops, Ceylon doesn't yet have first-class intersection types, except as a todo in the language specification. And our second principle states that the compiler isn't allowed to infer or even think about types which can't be expressed in the language!

Well, really, I was just waiting for the excuse to justify introducing intersection types, and this gave me the ammunition I was waiting for. So yesterday I found a couple of free hours to implement experimental support for intersection types in the typechecker, and, hey, it turned out to be much easier than I expected. It's also a practically useful feature. I've often wanted to write a method which accepts any value which is assignable to two different types, without introducing a new type just to represent the intersection of the types.

I'll leave you with two more interesting relationships, applying to intersection types:

If T is covariant in its type parameter, then T<U>&T<V> is a supertype of T<U&V> for any types U and V.
If T is contravariant in its type parameter, then T<U>&T<V> is a supertype of T<U|V> for any types U and V.

Nice symmetries here.

28 comments:
 
07. Aug 2011, 20:51 CET | Link
Simon

I'm following ceylon for a few days now and I've asked myself precisely the question you're answering here.

I guess that means ceylon's going to be my favourite cup of tea.

Keep it up!

ReplyQuote
 
07. Aug 2011, 22:49 CET | Link
Quintesse

Nice article Gavin! Although a little more code examples would be nice. :)

 
08. Aug 2011, 08:42 CET | Link

Thanks guys!

Yeah, I know this was a bit short on concrete examples. I kinda wrote it in a rush...

 
08. Aug 2011, 10:52 CET | Link
Ali

how do you want to translate this in bytecode?

 
08. Aug 2011, 11:34 CET | Link
Quintesse
@Ali: most (if not all) of this does not result in any bytecode at all, it's all, what I like to call, "type-system trickery". It's the compiler being able to reason better about your code and to be able to tell when certain code is legal or not. So a Union type or Intersection type will never actually exist, you can't instantiate a Foo&Bar type.
 
08. Aug 2011, 19:10 CET | Link
Simon
Actually I have one conceptual issue here: I just don't see how T<U>|T<V> is a subtype of T<U|V>,
or what is so super about T<U|V>.

To me, they are equivalent types which are normalized differently.

Let's say T<U|V> is the condensed form of the diffused equivalent T<U>|T<V>. In certain situations, such as the first example, the
condensed form is superior. The obvious question in that thinking is whether the diffused normalization is useful at all, or if it isn't better to drop it.

Dropping would mean you could denote types which aren't used internally, an interesting twist to the basic principles you laid out ;)

I don't think this has any impact, but super and sub have topological implications that I don't see in place here.
 
08. Aug 2011, 19:30 CET | Link
To me, they are equivalent types which are normalized differently.

Consider this:

Sequence<Float|Natural> seq = { 0, 1, 0.0, 1.0 };

This Sequence is clearly a Sequence<Float|Natural>, since every element is a Float|Natural. So, in fact, that is the type the compiler will infer if I just write:

value seq = { 0, 1, 0.0, 1.0 };  //infers type Sequence<Float|Natural>

But now, clearly seq is not a Sequence<Float>, nor a Sequence<Natural>, so nor is it a Sequence<Float>|Sequence<Natural>. The following code is definitely erroneous:

switch (seq)    //error: switch is not exhaustive
case (is Sequence<Float>) { ... }
case (is Sequence<Natural>) { ... }

So the two types are not exactly the same type.

 
08. Aug 2011, 20:55 CET | Link
Simon
I see. Thanks for shedding light on this.

There's another issue I'm sure I could read up but perhaps we can clarify it here for the sake of ceylon:

U&V denotes a type which is assignable to both U AND V. "AND" sounds like union, it actually is - in the intension. We can access the union of U's and V's members as a result. In the extension (all types assignable to U and all types assignable to V) it is the intersection, i.e. we can assign only the intersection of both sets to U&V.

U|V denotes a type which is assignable to either U OR V (or both). In the intension, it looks more like an intersection - we can access the intersection of U's and V's members. In the extension (all types assignable to U and all types assignable to V) it is the union - we can assign them all to U|V.

I guess that is just what you'd expect when taking this perspective. Sooo...

If intersection vs. union depends so much on what I look at, shouldn't the terms be abandoned?

Or, doesn't than mean when you talk about a union type you really refer to an "intersection TYPE" equivalent to a "union TYPE SET" and vice versa?

Probably the useage of the terms is long-established, but I'm really wondering if there are less ambiguous terms.

 
08. Aug 2011, 21:00 CET | Link
Simon

Crap. I meant to write:

... denotes a type which is assignable FROM ...

in the two cases.

 
08. Aug 2011, 21:05 CET | Link

The terms come from thinking about types as sets. The set of values of U&V is the intersection of the set of values of U with the set of values of V. The set of values of U|V is the union of the set of values of U with the set of values of V.

I agree the names a little inapt if you think of types in terms of their members, rather than their instances. But I don't know anything better to call them.

 
08. Aug 2011, 21:06 CET | Link

Disjunctive type and conjunctive type would actually be reasonable terms ;-)

 
08. Aug 2011, 21:20 CET | Link
Simon
I tend to agree with that. Let's see:

U|V is a disjunctive type that lets me assign the union of the two type sets.

U&V is a conjunctive type that lets me assign the intersection of the two type sets.

To me that's far more understandable.
 
08. Aug 2011, 21:30 CET | Link

Note that because of the intimate relationship between disjunction and union, and between conjunction and intersection, Ceylon uses the same notation for set union and intersection.

 
09. Aug 2011, 08:31 CET | Link
Simon
Note that because of the intimate relationship between disjunction and union, and between conjunction and intersection, Ceylon uses the same notation for set union and intersection.

I hadn't expected it would fit so nicely.

Though the slots interface is a bit strange to me; when applied to sets, the set of false slots is infinite but the set of true slots is not - perhaps the abstraction can be improved.

 
09. Aug 2011, 23:23 CET | Link
Simon wrote on Aug 09, 2011 02:31:
Though the slots interface is a bit strange to me; when applied to sets, the set of false slots is infinite but the set of true slots is not - perhaps the abstraction can be improved.

I'm not sure I understand exactly what you mean ... but certainly the set of true slots could be infinite ... there's no reason why an infinite set could not be represented as an instance of Slots.

 
10. Aug 2011, 21:38 CET | Link
Simon
Admittedly that was vague and quite unrelated. What I had in mind is that when I see my interface implementations go across qualitative differences
(as opposed to just quantitative ones) I take it as a warning sign that things may not be abstracted adequately.

In the slots example, how do you define (or enumerate) ~Set<Person>()?

My point was that the set of slots which are false is always infinitely large. That may be representable,
but enumeration is a problem. I'm unsure what one is to do, but perhaps the type system is helpful here again. Just a thought.
 
10. Aug 2011, 22:01 CET | Link
Simon

Reading again, I see the infix . That's probably nice for sets but less so for bytes. Anyway, seems the to be the outlier to me.

 
10. Aug 2011, 22:03 CET | Link
Simon
Again:

Reading again, I see the "infix" ~ (tilde). That's probably nice for sets but less so for bytes. Anyway, ~ seems the to be the outlier to me.
 
10. Aug 2011, 22:03 CET | Link

The unary prefix ~ operator is defined for a subtype called FixedSlots. It's not defined for Slots.

 
10. Aug 2011, 22:04 CET | Link
Gavin King wrote on Aug 10, 2011 16:03:
The unary prefix ~ operator is defined for a subtype called FixedSlots. It's not defined for Slots.

Of course, it's my fault that the tutorial doesn't mention this little wrinkle. You've been reading more closely than I was really expecting ;-)

 
11. Aug 2011, 23:18 CET | Link
Simon

Sounds reasonable. I'm not a big fan of the fixed but even I have to stop muttering at some point and let you do your work.

 
12. Aug 2011, 01:36 CET | Link
Simon wrote on Aug 11, 2011 17:18:
Sounds reasonable. I'm not a big fan of the fixed but even I have to stop muttering at some point and let you do your work.

:-)

Cheers mate.

 
17. Nov 2011, 04:31 CET | Link

Any progress report on Ceylon?

Getting anxious ;-)

 
20. Nov 2011, 01:33 CET | Link
Quintesse

@Steve: I guess by know you're aware of the announcement that Ceylon has gone public! http://www.ceylon-lang.org

 
21. Dec 2011, 23:55 CET | Link
Can you give examples for types U, V, none of them being a subtype of the other, such that U&V is not Bottom (i.e. "empty")?

If there is no such case then the following should hold:

U&V is:
V if V subtype of U,
U if U subtype of V,
Bottom in all other cases

If this is true, then the usefulness of the intersection is limited and you could just substitute the above explanation for U&V in your rules for T<U>|T<V>, not needing to introduce the intersection into the language.

So there seems to be a good reason that some languages define the union type but not the intersection type.
 
29. Dec 2011, 13:14 CET | Link
Stephane Epardaud
Stefan Springer wrote on Dec 21, 2011 17:55:
Can you give examples for types U, V, none of them being a subtype of the other, such that U AND V is not Bottom (i.e. empty)?

(forgive me I haven't found out how to type the ampersand so I'll use AND instead).

Fish AND Chips is such an example. Intersections can be two interfaces with no inheritance relation with one another. Edible AND Drinkable is another example. They represent something that is both Edible and Drinkable, no matter how they are implemented.

 
29. Jan 2012, 04:51 CET | Link
lvilnis
Here's something I've been wondering... what's the difference between tuple types and intersection types?

Seems to me for any value of an intersection type A&B&..., i can convert it to a value of tuple type A*B*... by upcasting:
i: A&B... => (i as A, i as B, ...): A*B*...

And for any value of a tuple type A*B*..., i can convert it to a value of an intersection type A&B&... by defining:
tup: A*B*... =>
tup as A => tup.1
tup as B => tup.2
...
 
29. Jan 2012, 05:37 CET | Link
lvilnis
Ah well, not quite - the conversion only works one way since tuples can have multiple elements with the same type.
Post Comment