This is the sixth installment in a series of articles introducing the Ceylon language. Note that some features of the language may change before the final release.
Defining generic types
We've seen plenty of parameterized types in this series of articles, but now let's explore a few more details.
Programming with generic types is one of the most difficult parts of Java. That's still true, to some extent, in Ceylon. But because the Ceylon language and SDK were designed for generics from the ground up, Ceylon is able to alleviate the most painful aspects of Java's bolted-on-later model.
Just like in Java, only types and methods may declare type parameters. Also just like in Java, type parameters are listed before ordinary parameters, enclosed in angle brackets.
shared interface Iterator<out Element> { ... }
class Array<Element>(Element... elements) satisfies Sequence<Element> { ... }
shared Entries<Natural,Value> entries<Value>(Value... sequence) { ... }
As you can see, the convention in Ceylon is to use meaningful names for type parameters.
Unlike Java, we always do need to specify type arguments in a type declaration (there are no raw types in Ceylon). The following will not compile:
Iterator it = ...; //error: missing type argument to parameter Element of Iterator
We always have to specify a type argument in a type declaration:
Iterator<String> it = ...;
On the other hand, we shouldn't need to explicitly specify type arguments in most method invocations or class instantiations. In principle it's very often possible to infer the type arguments from the ordinary arguments. The following code should be possible, just like it is in Java:
Array<String> strings = Array("Hello", "World");
Entries<Natural,String> entries = entries(strings);
But we haven't yet figured out what exactly the type inference algorithm will be (probably something involving union types!) and so the Ceylon compiler currently requires that all type arguments be explicitly specified like this:
Array<String> strings = Array<String>("Hello", "World");
Entries<Natural,String> entries = entries<Natural,String>(strings);
On the other hand, the following code does already compile:
local strings = Array<String>("Hello", "World");
local entries = entries<Natural,String>(strings);
The root cause of very many problems when working with generic types in Java is type erasure. Generic type parameters and arguments are discarded by the compiler, and simply aren't available at runtime. So the following, perfectly sensible, code fragments just wouldn't compile in Java:
if (is List<Person> list) { ... }
if (is Element obj) { ... }
(Where Element is a generic type parameter.)
A major goal of Ceylon's type system is support for reified generics. Like Java, the Ceylon compiler performs erasure, discarding type parameters from the schema of the generic type. But unlike Java, type arguments are supposed to be reified (available at runtime). Of course, generic type arguments won't be checked for typesafety by the underlying virtual machine at runtime, but type arguments are at least available at runtime to code that wants to make use of them explicitly. So the code fragments above are supposed to compile and function as expected. You will even be able to use reflection to discover the type arguments of an instance of a generic type.
The bad news is we haven't implemented this yet ;-)
Finally, Ceylon eliminates one of the bits of Java generics that's really hard to get your head around: wildcard types. Wildcard types were Java's solution to the problem of covariance in a generic type system. Let's first explore the idea of covariance, and then see how covariance in Ceylon works.
Covariance and contravariance
It all starts with the intuitive expectation that a collection of Geeks is a collection of Persons. That's a reasonable intuition, but especially in non-functional languages, where collections can be mutable, it turns out to be incorrect. Consider the following possible definition of Collection:
shared interface Collection<Element> {
shared formal Iterator<Element> iterator();
shared formal void add(Element x);
}
And let's suppose that Geek is a subtype of Person. Reasonable.
The intuitive expectation is that the following code should work:
Collection<Geek> geeks = ... ;
Collection<Person> people = geeks; //compiler error
for (Person person in people) { ... }
This code is, frankly, perfectly reasonable taken at face value. Yet in both Java and Ceylon, this code results in a compiler error at the second line, where the Collection<Geek> is assigned to a Collection<Person>. Why? Well, because if we let the assignment through, the following code would also compile:
Collection<Geek> geeks = ... ;
Collection<Person> people = geeks; //compiler error
people.add( Person("Fonzie") );
We can't let that code by — Fonzie isn't a Geek!
Using big words, we say that Collection is nonvariant in Element. Or, when we're not trying to impress people with opaque terminology, we say that Collection both produces — via the iterator() method — and consumes — via the add() method — the type Element.
Here's where Java goes off and dives down a rabbit hole, successfully using wildcards to squeeze a covariant or contravariant type out of a nonvariant type, but also succeeding in thoroughly confusing everybody. We're not going to follow Java down the hole.
Instead, we're going to refactor Collection into a pure producer interface and a pure consumer interface:
shared interface Producer<out Output> {
shared formal Iterator<Output> iterator();
}
shared interface Consumer<in Input> {
shared formal void add(Input x);
}
Notice that we've annotated the type parameters of these interfaces.
- The out annotation specifies that Producer is covariant in Output; that it produces instances of Output, but never consumes instances of Output.
- The in annotation specifies that Consumer is contravariant in Input; that it consumes instances of Input, but never produces instances of Input.
The Ceylon compiler validates the schema of the type declaration to ensure that the variance annotations are satisfied. If you try to declare an add() method on Producer, a compilation error results. If you try to declare an iterate() method on Consumer, you get a similar compilation error.
Now, let's see what that buys us:
- Since Producer is covariant in its type parameter Output, and since Geek is a subtype of Person, Ceylon lets you assign Producer<Geek> to Producer<Person>.
- Furthermore, since Consumer is contravariant in its type parameter Input, and since Geek is a subtype of Person, Ceylon lets you assign Consumer<Person> to Consumer<Geek>.
We can define our Collection interface as a mixin of Producer with Consumer.
shared interface Collection<Element>
satisfies Producer<Element> & Consumer<Element> {}
Notice that Collection remains nonvariant in Element. If we tried to add a variance annotation to Element in Collection, a compile time error would result.
Now, the following code finally compiles:
Collection<Geek> geeks = ... ;
Producer<Person> people = geeks;
for (Person person in people) { ... }
Which matches our original intuition.
The following code also compiles:
Collection<Person> people = ... ;
Consumer<Geek> geekConsumer = people;
geekConsumer.add( Geek("James") );
Which is also intuitively correct — James is most certainly a Person!
There's two additional things that follow from the definition of covariance and contravariance:
- Producer<Void> is a supertype of Producer<T> for any type T, and
- Consumer<Bottom> is a supertype of Consumer<T> for any type T.
These invariants can be very helpful if you need to abstract over all Producers or all Consumers. (Note, however, that if Producer declared upper bound type constraints on Output, then Producer<Void> would not be a legal type.)
You're unlikely to spend much time writing your own collection classes, since the Ceylon SDK has a powerful collections framework built in. But you'll still appreciate Ceylon's approach to covariance as a user of the built-in collection types. The collections framework defines two interfaces for each basic kind of collection. For example, there's an interface List<Element> which represents a read-only view of a list, and is covariant in Element, and OpenList<Element>, which represents a mutable list, and is nonvariant in Element.
Generic type constraints
Very commonly, when we write a parameterized type, we want to be able to invoke methods or evaluate attributes upon instances of the type parameter. For example, if we were writing a parameterized type Set<Element>, we would need to be able to compare instances of Element using == to see if a certain instance of Element is contained in the Set. Since == is only defined for expressions of type Equality, we need some way to assert that Element is a subtype of Equality. This is an example of a type constraint — in fact, it's an example of the most common kind of type constraint, an upper bound.
shared class Set<out Element>(Element... elements)
given Element satisfies Equality {
...
shared Boolean contains(Object obj) {
if (is Element obj) {
return obj in bucket(obj.hash);
}
else {
return false;
}
}
}
A type argument to Element must be a subtype of Equality.
Set<String> set = Set("C", "Java", "Ceylon"); //ok
Set<String?> set = Set("C", "Java", "Ceylon", null); //compile error
In Ceylon, a generic type parameter is considered a proper type, so a type constraint looks a lot like a class or interface declaration. This is another way in which Ceylon is more regular than some other C-like languages.
An upper bound lets us call methods and attributes of the bound, but it doesn't let us instantiate new instances of Element. Once we implement reified generics, we'll be able to add a new kind of type constraint to Ceylon. An initialization parameter specification lets us actually instantiate the type parameter.
shared class Factory<out Result>()
given Result(String s) {
shared Result produce(String string) {
return Result(string);
}
}
A type argument to Result of Factory must be a class with a single initialization parameter of type String.
Factory<Hello> = Factory<PersonalizedHello>(); //ok
Factory<Hello> = Factory<DefaultHello>(); //compile error
A third kind of type constraint is an enumerated type bound, which constrains the type argument to be one of an enumerated list of types. It lets us write an exhaustive switch on the type parameter:
Value sqrt<Value>(Value x)
given Value of Float | Decimal {
switch (Value)
case (satisfies Float) {
return sqrtFloat(x);
}
case (satisfies Decimal) {
return sqrtDecimal(x);
}
}
This is one of the workarounds we mentioned earlier for Ceylon's lack of overloading.
Finally, the fourth kind of type constraint, which is much less common, and which most people find much more confusing, is a lower bound. A lower bound is the opposite of an upper bound. It says that a type parameter is a supertype of some other type. There's only really one situation where this is useful. Consider adding a union() operation to our Set interface. We might try the following:
shared class Set<out Element>(Element... elements)
given Element satisfies Equality {
...
shared Set<Element> union(Set<Element> set) { //compile error
return ....
}
}
This doesn't compile because we can't use the covariant type parameter T in the type declaration of a method parameter. The following declaration would compile:
shared class Set<out Element>(Element... elements)
given Element satisfies Equality {
...
shared Set<Object> union(Set<Object> set) {
return ....
}
}
But, unfortunately, we get back a Set<Object> no matter what kind of set we pass in. A lower bound is the solution to our dilemma:
shared class Set<out Element>(Element... elements)
given Element satisfies Equality {
...
shared Set<UnionElement> union(Set<UnionElement> set)
given UnionElement abstracts Element {
return ...
}
}
With type inference, the compiler chooses an appropriate type argument to UnionElement for the given argument to union():
Set<String> strings = Set("abc", "xyz") ;
Set<String> moreStrings = Set("foo", "bar", "baz");
Set<String> allTheStrings = strings.union(moreStrings);
Set<Decimal> decimals = Set(1.2.decimal, 3.67.decimal) ; Set<Float> floats = Set(0.33, 22.0, 6.4); Set<Number> allTheNumbers = decimals.union(floats);
Set<Hello> hellos = Set( DefaultHello(), PersonalizedHello(name) );
Set<Object> objects = Set("Gavin", 12, true);
Set<Object> allTheObjects = hellos.union(objects);
There's more...
I was about to start talking about sequenced type parameters, the foundation of Ceylon's typesafe metamodel. But I realize I already hit my word limit. If you're really impatient, you can skip forward to Part 8.
In Part 7 we're going to back up a bit and cover a couple of topics that got kinda glossed over.
I'd love to hear some very-high level overview of what you think the initial release of the SDK (is that what you call it?) should contain? E.g. will the I/O libraries look like?
The initial release, i.e. the alpha version of the compiler, will probably only contain the language module.
But here's what I'm thinking are the highest priority modules:
We really have not put much thought into that yet.
Thanks for the info. Interesting that it has HTTP/HTML that close to the but I guess web apps is what people write a lot nowadays and it should be an easy entry point. I'm looking forward to seeing the io/nio/nio2 cleaned up behind a Ceylon interface. And the collections with guava-like features.
HTTP is probably the most important communication protocol on the application layer. Once you are able to talk to a database system, you'd most likely want to talk to other systems using HTTP. The WWW is an important aspect of this but not the only relevant case.
Also consider how bad the experience is with HTTP in Java (with the JDK and external libs). There is not one good and complete URI or URL class. So even addressing resources is a pain. The HTTPURLConnection etc. stuff in the JDK is one of the worst pieces of code ever written. Servlets/WARs are ridiculously complex considering the trivial tasks they have to do. JAX-RS is another half-done specification, and we can only hope that they add the missing pieces in the future. Do you like the Apache HTTP Components API? What about advanced HTTP protocols like Web/Card/CalDAV, which everyone is using on their iOS and Android devices without even realizing it. I think that having great support for HTTP is a major selling point for any application programmer who is looking to migrate to a new environment.
As for HTML, I'd say that should be support for working with XHTML. The more I use it, the more I like it.
I must say I'm really liking what I read so far! Great work.
I have 2 questions (only one directly related to Ceylon):
- I see many SDK modules mentioned . How is the integration with Java handled? What is going to look like? Any thoughts about making integration with native code (JNI) easier?
- You mention JBoss modules and I've been trying to find out how it works, but there doesn't seem to be any documentation?
Very perceptive question. I love that you've been paying that much attention!
As of right now, no. I considered that possibility, but I'm thinking it's going too far to have typesafe empty strings, though perhaps that's just a lack of faith on my part.
Currently the (very sketchy) declaration of String is:
shared class String(Character[] chars) extends Object() satisfies Comparable<String> & Iterable<Character> & Correspondence<Natural,Character> & Sized & Summable<String> & Castable<String> { shared Character[] characters; if (nonempty chars) { characters = chars.clone; } else { characters = {}; } ... }i.e. String is not a sequence of Characters, it just has a sequence of Characters.
I do not have complete confidence in this design, and it might need to change.
Modules based on existing Java stuff are going to be Ceylon-language wrappers over the underlying Java code.
The Ceylon compiler is being designed to integrate with OpenJDK. The idea behind this is that you should be able to inter-compile Java and Ceylon code from the same compiler.
But since not every language construct in Java has a matching construct in Ceylon (primitives, arrays, primitive nulls, overloading and wildcards), and since not every language construct in Ceylon has a matching construct in Java (first-class functions, member class refinement, default parameters, concrete members of interfaces, variance annotations), there needs to be a well-defined (but necessarily incomplete) mapping between the two languages. We have figured out some but not all of this.
Just one example. Suppose I have a Java method like this:
public Foo foo(Bar bar) { ... }The compiler needs to present this method to Ceylon as the following Ceylon type schema.
shared Foo foo(Bar? bar) { ... }Note something here: we don't make Foo? the return type. Instead, the compiler knows that this is a special case. If the calling Ceylon code looks like:
Then the compiler will automatically insert a null check and throw a NullPointerException if foo() returns null. On the other hand, if the callig Ceylon code looks like:
Then no null check will be inserted.
This gets around the lack of information about whether the return type of foo() is really supposed to be Foo or Foo?, the information that's missing in the Java code, but at the cost of a potential NPE if we make a mistake. That's the right thing I think. The boundary to Java can have non-typesafe null handling, since Java itself has non-typesafe null handling. But as soon as we've crossed the boundary from Java, and we're back in Ceylon code, no NPEs, ever.
No, not something that has been a priority at this stage.
Not sure. Ales has been working on the integration for JBoss Modules, and I'm lucky enough to be able to call Jason Green on the phone when I have questions :-)
Ok, thanks for your answers!
But seeing you mention I remembered something else I wanted to ask.
Years ago I read an article (which I of course haven't been able to find anymore) about somebody who was working on one of the more well-known Java XML parsers and who was ranting that it was difficult to write a really performant parser (that integrates nicely with the rest of Java) because all standard Java SDK APIs use String which makes it almost impossible to make a parser that doesn't have to do a lot of allocating and copying of character buffers. And he said .
Now that you're planning to make an entirely new SDK (although possibly wrapping standard Java APIs) is this something that you have thought about? Or are you saying: please leave me alone, I have enough to do already! ;)
That is something I find very annoying about Scala, when doing anything that involves processing text. It's not that uncommon to want to use collection-like operations on a string, like or .
The Java approach is that Strings aren't a collection of characters (CharSequence isn't a real collection class and doesn't have many of the useful methods on it). This means you can't pass a string to method that wants a collection without manually converting backwards and forwards.
The Scala approach of using Java's String class and having implicit conversations to the collection classes is one of the big uses for higher-kinded types and some of the pain it brings when you want to add a new collection implementation.
If you're willing to let Cylon Strings not be java.lang.String, then I think making them proper SequenceChars would be quite nice, as you get all the abilities of a proper sequence without Scala pain. One of the the big downsides though would be that you don't get some of the optimisations that HotSpot does on Strings.
Well, as you can see from the declaration of String I posted above, a String is still Iterable and it is still a Correspondence<Natural,Character>. So, according to this definition, you can do things like:
for (local char in string) { ... }And:
And:
Really the only thing is that I haven't had the courage to say that Ceylon's typesafe handling of the Empty sequence also applies to Strings.
Yes, the problem is that we're going to want to be able to write functions that operate polymorphically over both sequences and Strings. I feel like that's a pretty strong requirement.
I don't quite see how type constructor parametrization (higher kinds) helps you here, but I may be missing something quite Scala-specific. I mean, type constructor parametrization lets you operate polymorphically over container types, for example, letting you write a method that returns a Set<Character> when given a Set<Character>, and returns a List<Character> when given a List<Character>. I don't see how type constructor parametrization helps you write the method so that it returns a String when given a String. (Well, excuse me, I actually do know how to do things like that with a combination of type constructor parametrization and GADT support, but that doesn't apply to the problem we're talking about here.)
Again, I may be missing something very specific to Scala.
Additionally, I regard adding to the language as an absolute last resort. We've put a lot of work into not having implicit type conversions (and definitely not implicit parameters), and searching for alternate solutions. This programme seems to have succeeded, and I don't think Ceylon will need implicit type conversions, not even for numeric types.
So is there a solution to the problem of writing functions that operate polymorphically over both sequences and Strings? I think so. I think you can solve this problem elegantly using type classes, and we'll probably go down this path, but that definitely won't be in the first release of the language.
Note that the Ceylon String class will probably get erased by the compiler, leaving java.lang.Strings at the bytecode level.
And I think you guys can see how type classes are much more in the spirit of this language than implicit type conversions are :-)
But it would be nice of String was more abstract, an interface that could be implemented by others instead of always having to rely on the default implementation.
Well, that would probably make it harder to erase String to java.lang.String.
For me, it is just a way of bypassing the limitation of the first-order typing. Since 1995, it has been proved the true typing for OOP is the second-order type (cf. F-Bound theory), so why not use or insert the second-order typing in Ceylon. In that typing, the polymorhism is inherently provided without playing with overriding with covariant or contravariant declarations and the second-order typing answers correctly the problem of the recursive types.
In the second-order typing, classes are type generators (thus the name of type class).
So, for example with numbers we can defined the type class Number as Number<T statisfies Number<T>> and Integer as a type of this type class.
Or for the collections in your example:
Collection<T satisfies Collection<T>> is a type class of collections of objects of type T. So that Persons can be a collection of persons and Geeks a collections of geeks. We can defined also a type class for the type Persons so that Persons is a fix-point of Persons<T satisfies Persons<T>> (that in our case is a subclass of Collections<T satisfies Collection<T>>), so that Geeks becomes a type of this new type class. (ok, the syntax can be improved to be not so complex)
The problem we're trying to solve is to have collection types which are covariant in their type parameter, without resort to existential quantification. i.e. we don't want to have to write:
to get covariance, or:
to get contravariance.
The existential type Collection<P> given P satisfies Person is verbose, and often just plain confusing.
Nevertheless the accent shouldn't be in the syntax, but more in the proposition to support the second-order typing instead of the usual and very limited first-order one. With the second order typing, we avoid to play explicitly with covariance or contravariance (for me it is just a way to bypass limitation of the first-order typing)
For the syntax, a such declaration can be proposed to express a type class:
typeclass Collection<P> (meaning Collection<P> given P satisfies Collection<P>)
I admit the second-order typing appears, at first glance, less interesting with collections as usually they are just a way to handle several objects in one shot (F-Script does it in a interesting way). It becomes very interesting when polymorphism is needed between objects of non-related types (as the polymorphism is inherently a consequence of the second-order typing).
To improve the usability with the second-order typing, the language can also generate automatically a type class for each types (so that the type is the bounded value of the generated types family), as Smalltalk does with metaclasses (for each class, a metaclass is generated).
is T obj, should be is Element obj
Hello,
Have you heard of Kotlin by Jetbrain? They are using a different way (probably not in line with Ceylon's design) to sovle this very same issue: http://confluence.jetbrains.net/display/Kotlin/Generics#Generics-Declarationsitevariance
Instead of splitting an interface between producing and consuming methods (I can see it beng quite annoying in the long run) like you suggest here, they are using keywords In and Out to expresss whether the type is a consumer (In) or a producer (Out). It seems to me that using Ceylon annotations, it should be posible to do a bit the same without compromising your design.
What are your thoughts about this approach?
Cheers, Jean-Noel
He he, I like this declaration:
shared class String(Character[] chars) extends Object() satisfies Comparable<String> & Iterable<Character> & Correspondence<Natural,Character> & Sized & Summable<String> & Castable<String> {It reminds me of my java code:
public class IsoDate implements Identifiable, IdentifiableWithParentId, Statusable, IsoType, Visitable {Oh I wish I could write in Java the equivalent of Ceylon's:
shared void method(Identifiable&IdentifiableWithParentId&Statusable obj) {}The wording of this sentence,
makes me think Ceylon's (intended) support for reified generics is partial. But I can't see what is lacked. What am I missing? Thanks.