Red Hat

In Relation To Gavin King

In Relation To Gavin King

Union types and covariance, or why we need intersections

Posted by    |       |    Tagged as Ceylon Java EE

A couple of days ago I spotted an interesting hole in Ceylon's type checker. I started out with something like this:

Sequence<Float>|Sequence<Integer> numbers = .... ;
Sequence<Numbers> numbers2 = numbers;

The Sequence interface is covariant in its type parameter. So, since Float and Integer are both subtypes of Number, both Sequence<Float> and Sequence<Integer> are subtypes of Sequence<Number>. Then the compiler correctly reasons that the union of the two types is also a subtype of Sequence<Number>. Fine. Clever compiler.

Now, here's the hole:

value first = numbers.first; //compiler error: member does not exist
value first2 = numbers2.first; //ok, infers type Number

When it encountered the union type, the member resolution algorithm was looking for a common produced type of the type constructor Sequence in the hierarchies of each member of the union type. But since Sequence<Float> isn't a subtype of Sequence<Integer> and Sequence<Integer> isn't a subtype of Sequence<Float>, it simply wasn't finding a common supertype. This resulted in the totally counterintuitive (and, in my view, pathological) result that the member resolution algorithm could not assign a type to the member first of the type Sequence<Float>|Sequence<Integer>, but it could assign a type after widening to the supertype Sequence<Number>.

Of course, there might be many potential common supertypes of a union type. There's no justification for the member resolution algorithm to pick Sequence<Number> in preference to a sequence of any other common supertype of Float and Integer. We've got an ambiguity.

So I quickly realized that I had an example which was breaking two of the basic principles of Ceylon's type system:

  • It is possible to assign a unique type to any expression without cheating and looking at where the expression appears and how it is used.
  • All types used or inferred internally by the compiler are denoteable. That is, they can all be expressed within the language itself.

These principles are specific to Ceylon, and other languages with generics and type inference don't follow them. But failing to adhere to them - and especially to the second principle - results in extremely confusing error messages (Java's horrid capture-of type errors, for example). These two principles are a major reason why I say that Ceylon's type system is simpler than some other languages with sophisticated static type systems: when you get a typing error, we promise you that it's an error that humans can understand.

Fortunately I was able to discover a useful relationship between union types and covariance.

If T is covariant in its type parameter, then T<U>|T<V> is a subtype of T<U|V> for any types U and V.

But furthermore, T<U|V> is also a subtype of any common supertype of T<U> and T<V> produced from the type constructor T. We've successfully eliminated the ambiguity!

So I adjusted the member resolution algorithm to make use of this relationship. Now the problematic code compiles correctly, and infers the correct type for first:

value first = numbers.first; //ok: infers type Float|Integer

Well, that's great! Oh, but what about types with contravariant type parameters? What type should be inferred for the parameter of the consume() method of Consumer<Float>|Consumer<Integer>? Well, I quickly realized that the corresponding relationship for contravariant type parameters is this one:

If T is contravariant in its type parameter, then T<U>|T<V> is a subtype of T<U&V> for any types U and V.

Where U&V is the intersection of the two types. So the type of the parameter of consume() would be Float&Integer, which is intuitively correct. (Of course, since it is impossible for any object to be assignable to both Float and Integer, the compiler could go even further and reduce this type to the bottom type.)

But, ooops, Ceylon doesn't yet have first-class intersection types, except as a todo in the language specification. And our second principle states that the compiler isn't allowed to infer or even think about types which can't be expressed in the language!

Well, really, I was just waiting for the excuse to justify introducing intersection types, and this gave me the ammunition I was waiting for. So yesterday I found a couple of free hours to implement experimental support for intersection types in the typechecker, and, hey, it turned out to be much easier than I expected. It's also a practically useful feature. I've often wanted to write a method which accepts any value which is assignable to two different types, without introducing a new type just to represent the intersection of the types.

I'll leave you with two more interesting relationships, applying to intersection types:

If T is covariant in its type parameter, then T<U>&T<V> is a supertype of T<U&V> for any types U and V.
If T is contravariant in its type parameter, then T<U>&T<V> is a supertype of T<U|V> for any types U and V.

Nice symmetries here.

Raw types != type erasure

Posted by    |       |    Tagged as

In connection with this discussion it's worth making explicit what I guess everybody knows, but that sometimes seems to get a bit mixed up in conversation: that Java's support for raw types (necessary for backward compatibility with pre-generics code) doesn't really have anything much to do with type argument erasure. In a hypothetical language:

  • you could have raw types without type argument erasure, or
  • you could have type argument erasure without raw types.

Raw types have their own problems, of course (they're a designed-in hole in the type system). But a defense of the existence of raw types does not amount to a defense of partially reified types.

Three arguments for reified generics

Posted by    |       |    Tagged as Ceylon Java EE

Cedric recently brought up the topic of type erasure, concluding:

All in all, I am pretty happy with erasure and I’m hoping that the future versions of Java will choose to prioritize different features that are more urgently needed

Well, I suppose erasure isn't the thing I hate most about Java, but it's certainly up there. Java's system of partially reified types actually adds a surprising amount of complexity and unintuitive behavior to the type system.

From a pure language-design point of view, I think a partially reified type system is one of the worst decisions you could possibly make. Either reify all types, like C#, or reify none of them, like ML. And look, there's certain language features that simply don't play nicely with type erasure. A case in point: overloading. You can have type erasure, or you can have overloading (or, like Ceylon, you can have neither). You can't have both type erasure and overloading. No, Java is not a counter-example to this! In terms of language design, Java's approach to reification is almost impossible to justify except as a totally half-baked and misconceived workaround for simply not having time to Do It Right.

But Cedric's coming from a purely practical point of view, saying the problems don't actually bite him much when he's doing real work. Well, OK, I can see that. So here's the practical reasons why I think reified generics are needed, and why they should be added to Java if that could be done without messing up Java's type system even further.


Many frameworks depend upon having reified types. Type argument erasure cripples frameworks that work with generic types, and results in horrid workarounds like this one in CDI.

Typesafe narrowing

Instead of a Java-style instanceof operator, and C-style typecasts, Ceylon provides a construct for narrowing the type of a reference in a totally statically typesafe way. You just can't get ClassCastExceptions in Ceylon.

But this functionality depends upon having reified generics. Until we implement reified type arguments, we can't provide any mechanism to narrow to a parameterized type. Right now, you simply can't narrow an Object to a List<String>.

You might think that this is a problem with Ceylon, but really, the situation isn't much better in Java. The instanceof operator doesn't support types with type arguments, and casting to a type with type arguments is a totally unsafe operation! I just don't think that's acceptable behavior in a language with static typing.

Inter-language interoperability

Interoperability between statically-typed JVM languages is going to get really messy when some of the languages support reified generics and some don't. Especially since it's easy to imagine that those languages which do support reified generics won't support them in an interoperable way. This could turn out to be a real problem for the vision of a multi-language JVM.

Some questions about the design of Ceylon

Posted by    |       |    Tagged as Ceylon

Over the last couple of days, I've exchanged a few emails with Stephen Colebourne regarding Ceylon, and some of the decisions we made in designing the syntax of Ceylon.

I believe that syntax is an extremely important part of language design. Developers work in teams. We spend all day reading each others code. We spend much more time reading code than we spend writing code. Therefore, languages should be designed to optimize the process of reading and understanding someone else's code. (Indeed, since I have such an atrocious memory, when I read code that I wrote more than a month ago, I may as well be reading someone else's code!)

Well, I thought some of Stephen's questions/criticisms, and my responses to them, might be of interest to a wider audience, so I asked his permission to clean up some of our exchange and publish it here. He's kindly agreed. Note that what follows is mostly my own words, and doesn't purport to do justice to Stephen's side of the argument, or by any means completely represent his views of the language. I've included mainly just the items where there is a clear choice between alternatives, and I can clearly express the reasons for taking the path we took.

I hope that this helps you guys understand why we made certain decisions, and some of the forces that operate on the language design process, forces that aren't always completely obvious when you look at the final shape of a language.

String interpolation

Why does Ceylon use "Hello, " name.uppercase "!" for string interpolation instead of the more familiar Hello, ${name.uppercase}!"?

We originally wanted to use the ${...} escape syntax, but it turns out that this can't be lexed using the regular expression-based lexer technology in ANTLR. We took a look at how Groovy handles this, and it seems like they wind up using a hand-coded lexer. We wanted to be sure that our language was easy to lex and parse, since that helps the compiler give meaningful feedback to the user about syntax errors.

The other thing we had in mind was that our primary motivation for having string interpolation in the language was not for writing everyday procedural code, but rather for use in defining templates (for example web pages) using the declarative object builder syntax. We think that the syntax we ended up settling on for string interpolation works out much better for this application, even if it can be slightly harder to read in typical procedural code.

Out, out, damn semicolon!

Why require semicolons at line end?

A number of recent languages use significant whitespace to eliminate the need for a ; statement terminator. Typically, I believe, this is implemented as some kind of auto-semicolon-insertion that happens in the lexer. So the actual formal grammar of the language, which produces the parser, still features required semicolons. You just don't have to actually type them.

Unfortunately, auto-semicolon-insertion doesn't play well with the annotation syntax we wanted to use. An annotation in Ceylon looks syntactically like an expression statement (because, in fact, semantically it is an expression). So there's no way for the parser, let alone the lexer, to distinguish an annotation sitting on its own line from an expression statement.

So languages which have both annotations and auto-semicolon-insertion need to introduce some ugly characters to distinguish annotations. The two things I've seen are @Annotation @OtherAnnotation, following Java, and [Annotation OtherAnnotation], following C#. But then the designers of these languages find this syntax so offensive that they can't actually bear to use it for their own annotation! So they find themselves having to introduce reserved-word modifiers like public, abstract, virtual, etc.

In Ceylon, modifiers are just ordinary annotations. They aren't keywords. You can have an attribute called shared or abstract or default. Basically, given the choice between requiring @ on all annotations, or ; on all statements, and a bunch of extra reserved words, we chose the ; as the lesser of two evils.

Syntax for control structures

The null checking syntax if (exists name) seems verbose. Wouldn't a symbol be better? What's wrong with if (name==null) like in Java?

Well, we wanted to avoid the extremely thorny problem of trying to define equality for null. In Java, the expression person.address==org.address evaluates to true if both person.address and org.address are null. This is almost certainly not what the programmer intends. In SQL, the equivalent expression evaluates to null, resulting in the whole somewhat doubtful machinery of ternary logic (which has some pretty unintuitive consequences).

Ceylon sidesteps this whole problem by simply not defining equality for the null value. The compiler won't let you write person.address==org.address if either person.address or org.address might be null.

Anyway, exists is actually less verbose than ==null. Yes, it's the same number of characters, but the more relevant measure from a readability perspective is the number of tokens. (Your eyes read tokens, not characters.)

Sure, I suppose we could have gone with something like if (name?), but then we would have had to come up with symbols for nonempty and is, which begins to make the language cryptic to people who don't use it every day.

I don't like if (nonempty seq) either. Sure its clear, but its over-verbose.

Again, neither if (!seq.empty) nor if (!seq is Empty) is significantly less verbose that if (nonempty seq). And I simply don't want to introduce some cryptic symbol, like, say ?? to mean non-empty.

The for loop is fine and readable. And the collections hierarchy and integration with the type system looks good. But for (Natural i -> String op in entries(operators)) seems quite complex. As does if (exists String op = operators[i]).

Actually we recently decided to let you simply eliminate type annotations from control structures, letting you write, for example:

for (name in names) { ... }
for (i->op in entries(operators)) { ... }
if (exists op = operators[i]) { ... }

This turns out to be more readable in real code. The tutorial doesn't yet reflect this change.

I'm struggling with if (is Hello obj). I think because the keyword is in the wrong place.

Yes, all these constructs look backwards. The reason we did it this way is to be regular with the full form which introduces a variable name:

if (exists first = seq.first) { ... }
if (is Ngo org = person.employer) { ... }
if (nonempty tags = form.tags.value.tokens) { ... }

But I definitely don't love the backwardness of it.

shared vs public/protected/private

I use protected all the time, and losing that seems dubious.

I disagree. To me, there is no objective software-engineering justification for ever choosing Java's protected over public. Visibility modifiers exist to control dependencies between independent units of software. It doesn't matter if a dependency comes from a subclass or from a client class. What matters is in what unit the subclass or client class is defined.

Ceylon's visibility model can be used to localize the visibility of a program element to a unit of any of the following levels of granularity:

  • the containing scope
  • the package
  • the module
  • all modules.

That's already more expressive than Java, since Java doesn't have the notion of modules, or module-private visibility. And Ceylon can express all this with just one annotation, instead of three!


Attributes look like properties. So why the different name of the feature?

Well, to me the word property implies a getter or getter/setter pair. In Ceylon, not all attributes are getters. Attribute is a collective name to describe both property-style attributes and simple attributes. You can override a simple attribute with a property-style attribute and vice versa.

Type inference

Why not use val instead of local for type inference?

Actually, a few weeks ago, we finally made the decision to go with value for attributes/locals, and function for methods. That change isn't yet reflected in the tutorial. One reason (out of several) for doing this was that we might someday change our minds and let you use type inference for shared declarations. (Only if we can figure out how to do this without too much negative impact upon compiler performance.)

Syntax for inheritance

Why not use a colon instead of extends?

Well, it's partly a matter of taste. But the objective reasoning is that if you use : for extends, you then need to come up with punctuation that means satisfies, abstracts, adapts, of, etc, and you wind up in a rabbit hole of cryptic symbols like :>, <:, <%, etc. Imagine what it looks like to combine these together in the same type definition!

Using satisfies also seems another arbitrary change from Java.

Lots of people think that when they first see it, but actually it's not an arbitrary change. We did it this way for two reasons:

  1. So that upper bound type constraints can use a syntax that is regular with class and interface declarations. The keywords extends and implements simply don't sound right for defining an upper bound type constraint.
  2. Because the type that appears in the extends clause in a class definition comes with arguments to the superclass constructor. If we reused extends for interface extension like Java does, the grammar for the extends clause would be extremely irregular. And, on the other hand, implements doesn't sound right for interface inheritance.

The keyword satisfies reads correctly for all three cases: interface extension by another interface, interface implementation by a class, and upper bound constraints on a type parameter.


Why actual instead of the more common override?

The word override is a verb, and so it usually doesn't read well in a list of several annotations. Annotations read best together when they are all adjectives.

And the formal/abstract stuff gives the impression of being complex. I think its partly because it is a different model of working and partly because the keywords are unfamiliar.

It's really just the excellent overriding model used in C#, but we're using different annotation names.

We had to separate formal from abstract because they actually perform semantically different roles. Java and C# can get away with overloading the meaning of abstract, because an inner class can't be overridden. In Ceylon, where we have member class overriding, a formal class and an abstract class are semantically quite different creatures, and we need to be able to distinguish between them.

We could have used virtual like in C++ or C#, instead of default, but since Java doesn't have a modifier for this, and since we had already had to introduce the new terms actual and formal, I felt okay about choosing a name that I subjectively thought was clearer.

Union types

I'm not so enthusiastic about the use of union types underlying the typesafety of null values and empty sequences, but so long as they are rarely seen it is no big deal.

So it turns out that union types can be more transparent here than using an algebraic type like in functional languages. For example, you never need to instantiate wrappers. In Haskell, for example, you need to write Just 1 if you want to assign 1 to Maybe Int.

We do hide the union type under some syntax sugar, but that's definitely a leaky abstraction. Union types are an integral part of Ceylon's type system that anyone will need to get comfortable with if they want to be effective in the language. I think most people will come around to liking them.

The type inference section is where the union types start to appear useful.

Yeah, they even work nicely for generic type argument inference, which is not yet covered in the tutorial. One of the problems in Java's generics is that the compiler often infers types that are non-denotable, i.e. not representable within the Java language. This results in really confusing error messages. That never happens in Ceylon, since union types are denotable and the compiler never needs to infer any kind of existential type. This really is one of the nicest properties of our type system. An invisible cost of a more complex/powerful type system is more subtle/confusing error messages.


Introductions look interesting, but I don't have an immediate similarity to map them to. As you rightly point out, implicits are a poor feature.

Introductions are like a compromise between two features you'll find in other languages. Extension methods are best known from C#, but actually have a long prior history. Implicit type conversions are featured in several languages including C++ and Scala.

Extension methods are a safe, convenient feature that let you add new members to a pre-existing type. Unfortunately, they don't give you the ability to introduce a new supertype to the type.

Implicit type conversions are a dangerous feature that, although they seem simple to understand, actually screw up several useful properties of the type system (including transitivity of assignability), introducing complexity into mechanisms like member resolution and type argument inference, and can easily be abused.

Introduction is a disciplined way to introduce a new supertype to an existing type, using a mechanism akin to extension methods, without the downsides of implicit type conversions.

Member classes and member objects

Member classes look quite clever and easy to understand at first glance. I can't quickly grasp why I'd want nested object instances.

Member objects turn out to be super-convenient once you actually start writing Ceylon code. I often use them instead of a nested class if the nested class would have no initializer parameters. They're a bit like a Java-style anonymous inner class, in that they give you a quick way to extend an existing type, while giving you access to state in the containing scope, but the syntax is a little cleaner, and an object may have multiple supertypes.

For example, a member object is a convenient way to implement the Iterator interface for an Iterable object.


So, my thoughts on the generics part is that there is a lot of power there with the type system performing better than Java. But the syntax is very verbose.

It's intentionally verbose. Generic declarations are something most developers write much less often than they read. Certain kinds of verbosity can make code more readable, not less.

There's some languages I've seen - not naming names - which, at least to the outside observer, seem to operate on the principle that the more advanced a language feature is, the more cryptic the syntax can be. I think that's exactly backwards. Cryptic syntax is perhaps acceptable for optimizing very common things, but it should be avoided for expressing things that are uncommon or otherwise difficult to understand.

Perhaps that will work out. But I think that deep down I feel SQL and wordy languages are from the past not the future.

The thing is that as you start getting to languages that are as feature-rich as the ones we're looking at as potential Java replacements, you simply run out of punctuation characters on the keyboard. ;-)

So you have the choice between:

  • arbitrary cryptic combinations of symbols and and reuse of the same punctuation to mean totally different things in different contexts, or
  • a slightly more wordy style which I think ultimately is much easier for your brain to parse, especially if you're new to the language, or if it isn't the language you use every day.

What I'm hoping is that when people who are not Ceylon programmers see Ceylon code in a blog, they'll be able to get the gist of what it is doing. (Even if they don't immediately understand all the details.) There are languages like this (Python is a shining example), and languages which are not like this. That's not really a value judgement, just a statement about the goals of this particular language that we're working on.

More generally, and with Fantom in my mind, I'm not sure that trying to use the type system to this degree actually makes sense. Using the type system to prevent bugs is good, but using it to tie you down to absolutely precise inputs is increasingly something I see as unnecessary.

I've written Java code before Java had generics and it wasn't a nice experience. I'll never go back to a language without parametric polymorphism. That's not to say that Java's generics system is perfect. On the contrary, I get the feeling that they kinda panicked and rushed something ever so slightly half-baked into the language when they saw C# come out with this feature. But the addition of generics certainly improved the language, even given the imperfections.

Now, I suppose that my point of view on this mainly comes from the fact that I'm a totally IDE-oriented guy. I have never even once in my career written Java code without the help of an IDE. If you asked me to run javac from the commandline I would go racing off to google to figure out how to do that. And so I don't want my IDE to be crippled by not being able to properly analyze the types of things. Really, the whole reason we even have statically typed languages is to enable this kind of sophisticated tooling. A statically typed language without generics cripples that tooling.


The distinctions between immutable and mutable, the variable annotation, and = and := look like a lot of rules to remember.

The rules are:

  • If you want to be able to assign a value to something more than once, you need to annotate it variable. It's the precise opposite of Java where you need to annotate something final if you don't want to be able to assign to it.
  • To assign to a variable, you use :=. Otherwise, you use =.

Like in ML, this is to warn you that the code is doing something side-effecty.


Also, concurrency does seem to be missing from the tutorial.

That's a job for the SDK and other libraries.


On one issue that Stephen and I do completely agree on, Stephen's latest post puts the case for prefix type annotations far more comprehensively than I did. (So as not to reopen the flamewar here in this site, I'll be deleting comments relating to this issue.)

My experience of returning to Java

Posted by    |       |    Tagged as Ceylon

So my recent return to writing code in Java has been interesting. Most of my Java programming experience has been in web apps, where there is a lot of UI/declarative code, and state-holding classes, or in framework development where I need a lot of interception and reflective code, and in those domains I have often found that Java gets in the way. But now I'm writing a compiler (well, a type checker/analyzer to be precise), and I don't have much use for declarative code, interception, or reflection. And there is a lot more code that does stuff rather than represents state or data. Java is honestly a quite different experience in this domain. My overall reaction is that Java is simply very reasonable and non-annoying for this kind of work. It just doesn't get in my way much. And in an IDE like Eclipse, Java's static typing saves me enormous gobs of time.

Nevertheless, there are definitely some moments where I find myself wishing I had Ceylon already. Here are the things I really miss:

  • Typesafe null. One of the things about a compiler is that it needs to be able to accept input that is partly rubbish and partly meaningful and just keep going and do the best it can with what it has. That means that there are a lot of null checks in my code, and, frankly, catching all the places an NPE could occur is very tough work. I wish the compiler would help me here.
  • Algebraic types. There's lots of instanceof in my code. Indeed, the whole thing is a bunch of tree Visitors. Algebraic types, typesafe narrowing, and built-in support for the visitor pattern would help enormously.
  • Mixin inheritance. The metamodel classes which represent Ceylon types would turn out much cleaner if I had concrete methods on interfaces. As it is, I'm stuck with some code duplication.

These are the three things I really miss. Sure, there are other bits of Java that I don't love, but for now they're just not bothering me much.

Modules in Ceylon

Posted by    |       |    Tagged as Ceylon

Built-in support for modularity is a major goal of the Ceylon project, but what am I really talking about when I use this word? Well, I suppose there's multiple layers to this:

  1. Language-level support for a unit of visibility that is bigger than a package, but smaller than all packages.
  2. A module descriptor format that expresses dependencies between specific versions of modules.
  3. A built-in module archive format and module repository layout that is understood by all tools written for the language, from the compiler, to the IDE, to the runtime.
  4. A runtime that features a peer-to-peer classloading (one classloader per module) and the ability to manage multiple versions of the same module.
  5. An ecosystem of remote module repositories where folks can share code with others.

I'm not going to get into a whole lot of fine detail of this, partly because what I have written down in the language spec today will probably change by the time you actually get to use any of this stuff, but let me give you a taste of the overall architecture proposed.

Module-level visibility

A package in Ceylon may be shared or unshared. An unshared package (the default) is visible only to the module which contains the package. We can make the package shared by providing a package descriptor:

Package package { 
    name = 'org.hibernate.query'; 
    shared = true; 
    doc = "The typesafe query API."; 

(Alert readers will notice that this is just a snippet of Ceylon code, using the declarative object builder syntax.)

A shared package defines part of the public API of the module. Other modules can directly access shared declarations in a shared package.

Module descriptors

A module must explicitly specify the other modules on which it depends. This is accomplished via a module descriptor:

Module module { 
    name = 'org.hibernate'; 
    version = '3.0.0.beta'; 
    doc = "The best-ever ORM solution!"; 
    license = ''; 
    Import {
        name = 'ceylon.language'; 
        version = '1.0.1'; 
        export = true;
    Import {
        name = 'java.sql'; 
        version = '4.0';

A module may be runnable. A runnable module must specify a run() method in the module descriptor:

Module module { 
    name = 'org.hibernate.test'; 
    version = '3.0.0.beta'; 
    doc = "The test suite for Hibernate";
    license = ''; 
    void run() {
    Import {
        name = 'org.hibernate'; version = '3.0.0.beta';

Module archives and module repositories

A module archive packages together compiled .class files, package descriptors, and module descriptors into a Java-style jar archive with the extension car. The Ceylon compiler doesn't usually produce individual .class files in a directory. Instead, it directly produces module archives.

Module archives live in module repositories. A module repository is a well-defined directory structure with a well-defined location for each module. A module repository may be either local (on the filesystem) or remote (on the Internet). Given a list of module repositories, the Ceylon compiler can automatically locate dependencies mentioned in the module descriptor of the module it is compiling. And when it finishes compiling the module, it puts the resulting module archive in the right place in a local module repository.

(The architecture also includes support for source directories, source archives, and module documentation directories, but I'm not going to cover all that today.)

Module runtime

Ceylon's module runtime is based on JBoss Modules, a technology that also exists at the very core of JBoss 7. Given a list of module repositories, the runtime automatically locates a module archive and its versioned dependencies in the repositories, even downloading module archives from remote repositories if necessary.

Normally, the Ceylon runtime is invoked by specifying the name of a runnable module at the command line.

Module repository ecosystem

One of the nice advantages of this architecture is that it's possible to run a module straight off the internet, just by typing, for example:

ceylon org.jboss.ceylon.demo -rep

And all required dependencies get automatically downloaded as needed.

Red Hat will maintain a central public module repository where the community can contribute reusable modules. Of course, the module repository format will be an open standard, so any organization can maintain its own public module repository.

Ceylon progress report

Posted by    |       |    Tagged as Ceylon

Hrm, I notice it's been just over three months since I semi-accidentally announced the existence of the Ceylon project, and I guess I feel like you folks deserve some kind of progress report! At the time, I very much regretted the fact that the project became public knowledge before I was really prepared to socialize it, but in retrospect it was the best thing ever for us. That's where we got Stef and Tako and Sergej and Ben from, along with the other folks who are signing up to get involved in development. Unfortunately, we're still working in a private github repo, which is certainly not ideal, but it's helping keep us focused on getting actual code written.

So here's what we have so far:

  • a 125 page language specification (with some open issues and vague sections),
  • a parser and typesafe syntax tree for the whole language,
  • a compiler frontend (type checker/analyzer) for about 85% of the language,
  • a compiler backend that integrates the frontend with javac's bytecode generation for perhaps 40% of the language, and
  • the skeleton of a model loader that builds a metamodel of precompiled .class files (essential for incremental compilation and interoperation with Java).

The frontend of the compiler (the bit that analyzes the semantics of the code, assigns types to things, and reports programming errors) is basically done already. Certainly all the hard bits are finished, including stuff like generics, covariance, subtyping, refinement, member types, union types, definite assignment and definite return checking, type argument inference, etc. The few missing features could be finished off pretty quickly if there were any real urgency, but we may as well wait for the backend to catch up.

Development of the backend and model loader is now completely in the hands of volunteers from the community, and, frankly, it's going really well so far. We'll do an initial alpha-quality release of the compiler when we're happy that the backend is in a usable form.

I'm not going to promise any exact date for the first release, nor even the exact feature set - but I'm guessing it will happen within the next three months, and that it will include a really decent slab of the language defined by the specification. At that point, we'll hopefully be able to start putting some resources into the SDK and IDE.

The case against do-while

Posted by    |       |    Tagged as Ceylon

So the comment thread of my previous post got me thinking again about the do/while statement. Frankly, it's difficult to see why we really need this as a first-class construct in modern programming languages. Here's my list of reasons for saying that.

First, at least in C-like languages, the syntax is pretty irregular. It's the only control statement that doesn't follow this syntax pattern:

( keyword ("(" Something ")")? Block )+

Instead its syntax features two glorious keywords and a stray - totally extraneous - semicolon. Weird.

('Cos, like, even with the help of two keywords and a pair of parens, the parser still needs that extra juicy semicolon to know when to stop looking for more bits of the do/while statement. Sure, that makes sense.)

Anyway, I figure that it's this irregularity that apparently leads different people to have different intuitions about the scope of declarations contained in the body of the do/while.

Second, do/while demands that we reserve as a keyword one of the most useful verbs in the English language. I could have found all kinds of uses for the word do if Java would just let me use it as an identifier!

Third, do/while is easily emulated using a while loop. The following:

do {
while (!finished);

is only somewhat less readable when written like this:

while (true) {
    if (finished) break;

And this second formulation doesn't open up any debate about the scope of things.

Fourth, the current crop of programming languages all have support for higher-order functions, allowing libraries to introduce new kinds of flow control, and taking pressure off the language itself to provide every possible kind of loop baked into its basic syntax.

Finally, even though I've written a reasonably significant amount of Java code in over more than a decade of familiarity with the language, I can barely recall ever having found a use for this construct. Unless my memory is even worse than I think it is, I can't possibly have used do/while more than two or three times in my entire career!

So I'm strongly considering simply dropping do/while from Ceylon. It's simply not pulling its weight.

A wrinkle in Java's do-while

Posted by    |       |    Tagged as Ceylon

Today I tried to write (approximately) this code in Java:

Scope scope = declaration.getContainer();
do {
    if (scope.getDirectMemberOrParameter(model.getName())!=null) {
        node.addError("duplicate declaration name: " + declaration.getName());
    boolean isControlBlock = scope instanceof ControlBlock;
    scope = scope.getContainer();
while (isControlBlock);  //compile error

This didn't work, since Java doesn't consider the while condition expression to belong to the do block. I think this is wrong. Surely the loop should be allowed to compute it's own termination condition? I realize that do/while is a fairly uncommon construct, but when it is used, I imagine it's pretty common that it would be used like this.

The only reasonable way to fix the above code is to move the declaration of isControlBlock outside the loop:

boolean isControlBlock;
Scope scope = declaration.getContainer();
do {
    if (scope.getDirectMemberOrParameter(model.getName())!=null) {
        node.addError("duplicate declaration name: " + declaration.getName());
    isControlBlock = scope instanceof ControlBlock;
    scope = scope.getContainer();
while (isControlBlock);

Note that isControlBlock is now a variable. I would not be able to declare it final.

The Ceylon type analyzer accepts the equivalent code:

variable local scope = declaration.container;
do {
    if (scope.directMemberOrParameter( exists) {
        node.addError("duplicate declaration name: " +;
    local isControlBlock = scope is ControlBlock;
    scope = scope.container;
while (isControlBlock);

I'm going to clarify that this is allowed in the language specification.

Sequences and sequenced parameters

Posted by    |       |    Tagged as Ceylon

I've been thinking about the problem of passing a Sequence of values to a sequenced parameter in Ceylon (a varargs parameter in Java terminology). Consider:

void print(Object... objects) { ... }
String[] words = {"hello", "world"};
print(words);   //what does this do?

Does the second line mean that we're passing a single Sequence<String> to print(), or two Strings? Java behaves very strangely in this situation:

print(new String[]{"hello", "world"}); //passes two Strings with a compiler warning asking for an explicit cast to Object[]
print(new Object[]{"hello", "world"}); //passes two Objects with no compiler warning
print(new String[]{"hello", "world"}, new String[]{"hello", "world"}); //passes two String[] arrays as varargs!


Things gets even a little more complicated when you have a generic method like this:

T? first<T>(T... objects) { ... }
String[] words = {"hello", "world"};
first(words);    //what type should be inferred for T?

This really starts to screw up my beautiful clean type argument inference algorithm! Which is why this issue is coming up now - it's a corner case that I only noticed once I actually implemented generic type argument inference in the type analyzer.

So I think we need to make you explicitly specify what you mean when you pass a sequence of values to a sequenced parameter. I have a couple of ideas about how to do this.

Solution 1

First solution, kinda indirectly inspired by groovy, would be a special syntax in the positional parameter method invocation protocol.

String[] words = {"hello", "world"};

print(words);      //pass a single String[]
print(words...);   //pass two Strings

String[]? words2 = first(words);   //infers T = String[];
String? word = first(words...);    //infers T = String

I think this reads fairly naturally. The downside is it's a special-purpose kind of punctuation that needs to be specially explained in the specification.

Solution 2

Second solution is to introduce a special type (a subtype of Sequence) to represent a package of sequenced arguments. Call it SequencedArguments. Then, with a little helper method spread() that wraps up a Sequence as a SequencedArguments, the syntax would look like:

String[] words = {"hello", "world"};

print(words);           //compiler automatically produces a SequencedArguments<String[]>
print(spread(words)));  //explicitly pass a SequencedArguments<String>

String[]? words2 = first(spread(words));    //infers T = String[];
String? word = first(spread(words));        //infers T = String

This is a little more verbose, but reasonable. It also makes the specification easier to write.

Solution 3

Solutions 1 and 2 can be combined very elegantly. We can define:

  • T... means SequencedArguments<T> for any type T
  • e... means SequencedArguments(e) for any expression e

So T... is just a type name abbreviation like T[] and T?, and e... is just an operator expression. We end up with exactly the same syntax as Solution 1, but with the semantics of Solution 2.

I think this works out, and is very much in the spirit of the language. On the other hand, if T... is just an ordinary type declaration, I don't know how we can go about enforcing that a sequenced parameter must be the last parameter in a parameter list. I kinda like the fact that this is an error:

void print(Object... objects, OutputStream stream) { ... } //compile error?

WDYT? Does print(words...) read well to you guys, or does it feel arbitrary?


Let's not confuse this too much with the idea of applying an operation to a tuple of arguments like what you can do in functional languages and dynamic languages. This is superficially similar, but not quite the same.


A reasonable syntactic variation that perhaps reads somewhat better would use all as a keyword:

void print(Object all objects) { ... }
print(words);       //pass a single String[]
print(all words);   //pass two Strings

I could probably get into this if I didn't just hate the idea of keywordizing the very useful word all.

back to top