Some questions about the design of Ceylon

Over the last couple of days, I've exchanged a few emails with Stephen Colebourne regarding Ceylon, and some of the decisions we made in designing the syntax of Ceylon.

I believe that syntax is an extremely important part of language design. Developers work in teams. We spend all day reading each others code. We spend much more time reading code than we spend writing code. Therefore, languages should be designed to optimize the process of reading and understanding someone else's code. (Indeed, since I have such an atrocious memory, when I read code that I wrote more than a month ago, I may as well be reading someone else's code!)

Well, I thought some of Stephen's questions/criticisms, and my responses to them, might be of interest to a wider audience, so I asked his permission to clean up some of our exchange and publish it here. He's kindly agreed. Note that what follows is mostly my own words, and doesn't purport to do justice to Stephen's side of the argument, or by any means completely represent his views of the language. I've included mainly just the items where there is a clear choice between alternatives, and I can clearly express the reasons for taking the path we took.

I hope that this helps you guys understand why we made certain decisions, and some of the forces that operate on the language design process, forces that aren't always completely obvious when you look at the final shape of a language.

String interpolation

Why does Ceylon use "Hello, " name.uppercase "!" for string interpolation instead of the more familiar Hello, ${name.uppercase}!"?

We originally wanted to use the ${...} escape syntax, but it turns out that this can't be lexed using the regular expression-based lexer technology in ANTLR. We took a look at how Groovy handles this, and it seems like they wind up using a hand-coded lexer. We wanted to be sure that our language was easy to lex and parse, since that helps the compiler give meaningful feedback to the user about syntax errors.

The other thing we had in mind was that our primary motivation for having string interpolation in the language was not for writing everyday procedural code, but rather for use in defining templates (for example web pages) using the declarative object builder syntax. We think that the syntax we ended up settling on for string interpolation works out much better for this application, even if it can be slightly harder to read in typical procedural code.

Out, out, damn semicolon!

Why require semicolons at line end?

A number of recent languages use significant whitespace to eliminate the need for a ; statement terminator. Typically, I believe, this is implemented as some kind of auto-semicolon-insertion that happens in the lexer. So the actual formal grammar of the language, which produces the parser, still features required semicolons. You just don't have to actually type them.

Unfortunately, auto-semicolon-insertion doesn't play well with the annotation syntax we wanted to use. An annotation in Ceylon looks syntactically like an expression statement (because, in fact, semantically it is an expression). So there's no way for the parser, let alone the lexer, to distinguish an annotation sitting on its own line from an expression statement.

So languages which have both annotations and auto-semicolon-insertion need to introduce some ugly characters to distinguish annotations. The two things I've seen are @Annotation @OtherAnnotation, following Java, and [Annotation OtherAnnotation], following C#. But then the designers of these languages find this syntax so offensive that they can't actually bear to use it for their own annotation! So they find themselves having to introduce reserved-word modifiers like public, abstract, virtual, etc.

In Ceylon, modifiers are just ordinary annotations. They aren't keywords. You can have an attribute called shared or abstract or default. Basically, given the choice between requiring @ on all annotations, or ; on all statements, and a bunch of extra reserved words, we chose the ; as the lesser of two evils.

Syntax for control structures

The null checking syntax if (exists name) seems verbose. Wouldn't a symbol be better? What's wrong with if (name==null) like in Java?

Well, we wanted to avoid the extremely thorny problem of trying to define equality for null. In Java, the expression person.address==org.address evaluates to true if both person.address and org.address are null. This is almost certainly not what the programmer intends. In SQL, the equivalent expression evaluates to null, resulting in the whole somewhat doubtful machinery of ternary logic (which has some pretty unintuitive consequences).

Ceylon sidesteps this whole problem by simply not defining equality for the null value. The compiler won't let you write person.address==org.address if either person.address or org.address might be null.

Anyway, exists is actually less verbose than ==null. Yes, it's the same number of characters, but the more relevant measure from a readability perspective is the number of tokens. (Your eyes read tokens, not characters.)

Sure, I suppose we could have gone with something like if (name?), but then we would have had to come up with symbols for nonempty and is, which begins to make the language cryptic to people who don't use it every day.

I don't like if (nonempty seq) either. Sure its clear, but its over-verbose.

Again, neither if (!seq.empty) nor if (!seq is Empty) is significantly less verbose that if (nonempty seq). And I simply don't want to introduce some cryptic symbol, like, say ?? to mean non-empty.

The for loop is fine and readable. And the collections hierarchy and integration with the type system looks good. But for (Natural i -> String op in entries(operators)) seems quite complex. As does if (exists String op = operators[i]).

Actually we recently decided to let you simply eliminate type annotations from control structures, letting you write, for example:

for (name in names) { ... }

for (i->op in entries(operators)) { ... }

if (exists op = operators[i]) { ... }

This turns out to be more readable in real code. The tutorial doesn't yet reflect this change.

I'm struggling with if (is Hello obj). I think because the keyword is in the wrong place.

Yes, all these constructs look backwards. The reason we did it this way is to be regular with the full form which introduces a variable name:

if (exists first = seq.first) { ... }

if (is Ngo org = person.employer) { ... }

if (nonempty tags = form.tags.value.tokens) { ... }

But I definitely don't love the backwardness of it.

`shared` vs `public/protected/private`

I use protected all the time, and losing that seems dubious.

I disagree. To me, there is no objective software-engineering justification for ever choosing Java's protected over public. Visibility modifiers exist to control dependencies between independent units of software. It doesn't matter if a dependency comes from a subclass or from a client class. What matters is in what unit the subclass or client class is defined.

Ceylon's visibility model can be used to localize the visibility of a program element to a unit of any of the following levels of granularity:

the containing scope
the package
the module
all modules.

That's already more expressive than Java, since Java doesn't have the notion of modules, or module-private visibility. And Ceylon can express all this with just one annotation, instead of three!

Attributes

Attributes look like properties. So why the different name of the feature?

Well, to me the word property implies a getter or getter/setter pair. In Ceylon, not all attributes are getters. Attribute is a collective name to describe both property-style attributes and simple attributes. You can override a simple attribute with a property-style attribute and vice versa.

Type inference

Why not use val instead of local for type inference?

Actually, a few weeks ago, we finally made the decision to go with value for attributes/locals, and function for methods. That change isn't yet reflected in the tutorial. One reason (out of several) for doing this was that we might someday change our minds and let you use type inference for shared declarations. (Only if we can figure out how to do this without too much negative impact upon compiler performance.)

Syntax for inheritance

Why not use a colon instead of extends?

Well, it's partly a matter of taste. But the objective reasoning is that if you use : for extends, you then need to come up with punctuation that means satisfies, abstracts, adapts, of, etc, and you wind up in a rabbit hole of cryptic symbols like :>, <:, <%, etc. Imagine what it looks like to combine these together in the same type definition!

Using satisfies also seems another arbitrary change from Java.

Lots of people think that when they first see it, but actually it's not an arbitrary change. We did it this way for two reasons:

So that upper bound type constraints can use a syntax that is regular with class and interface declarations. The keywords extends and implements simply don't sound right for defining an upper bound type constraint.
Because the type that appears in the extends clause in a class definition comes with arguments to the superclass constructor. If we reused extends for interface extension like Java does, the grammar for the extends clause would be extremely irregular. And, on the other hand, implements doesn't sound right for interface inheritance.

The keyword satisfies reads correctly for all three cases: interface extension by another interface, interface implementation by a class, and upper bound constraints on a type parameter.

`abstract/default/formal/actual`

Why actual instead of the more common override?

The word override is a verb, and so it usually doesn't read well in a list of several annotations. Annotations read best together when they are all adjectives.

And the formal/abstract stuff gives the impression of being complex. I think its partly because it is a different model of working and partly because the keywords are unfamiliar.

It's really just the excellent overriding model used in C#, but we're using different annotation names.

We had to separate formal from abstract because they actually perform semantically different roles. Java and C# can get away with overloading the meaning of abstract, because an inner class can't be overridden. In Ceylon, where we have member class overriding, a formal class and an abstract class are semantically quite different creatures, and we need to be able to distinguish between them.

We could have used virtual like in C++ or C#, instead of default, but since Java doesn't have a modifier for this, and since we had already had to introduce the new terms actual and formal, I felt okay about choosing a name that I subjectively thought was clearer.

Union types

I'm not so enthusiastic about the use of union types underlying the typesafety of null values and empty sequences, but so long as they are rarely seen it is no big deal.

So it turns out that union types can be more transparent here than using an algebraic type like in functional languages. For example, you never need to instantiate wrappers. In Haskell, for example, you need to write Just 1 if you want to assign 1 to Maybe Int.

We do hide the union type under some syntax sugar, but that's definitely a leaky abstraction. Union types are an integral part of Ceylon's type system that anyone will need to get comfortable with if they want to be effective in the language. I think most people will come around to liking them.

The type inference section is where the union types start to appear useful.

Yeah, they even work nicely for generic type argument inference, which is not yet covered in the tutorial. One of the problems in Java's generics is that the compiler often infers types that are non-denotable, i.e. not representable within the Java language. This results in really confusing error messages. That never happens in Ceylon, since union types are denotable and the compiler never needs to infer any kind of existential type. This really is one of the nicest properties of our type system. An invisible cost of a more complex/powerful type system is more subtle/confusing error messages.

Introductions

Introductions look interesting, but I don't have an immediate similarity to map them to. As you rightly point out, implicits are a poor feature.

Introductions are like a compromise between two features you'll find in other languages. Extension methods are best known from C#, but actually have a long prior history. Implicit type conversions are featured in several languages including C++ and Scala.

Extension methods are a safe, convenient feature that let you add new members to a pre-existing type. Unfortunately, they don't give you the ability to introduce a new supertype to the type.

Implicit type conversions are a dangerous feature that, although they seem simple to understand, actually screw up several useful properties of the type system (including transitivity of assignability), introducing complexity into mechanisms like member resolution and type argument inference, and can easily be abused.

Introduction is a disciplined way to introduce a new supertype to an existing type, using a mechanism akin to extension methods, without the downsides of implicit type conversions.

Member classes and member `object`s

Member classes look quite clever and easy to understand at first glance. I can't quickly grasp why I'd want nested object instances.

Member objects turn out to be super-convenient once you actually start writing Ceylon code. I often use them instead of a nested class if the nested class would have no initializer parameters. They're a bit like a Java-style anonymous inner class, in that they give you a quick way to extend an existing type, while giving you access to state in the containing scope, but the syntax is a little cleaner, and an object may have multiple supertypes.

For example, a member object is a convenient way to implement the Iterator interface for an Iterable object.

Generics

So, my thoughts on the generics part is that there is a lot of power there with the type system performing better than Java. But the syntax is very verbose.

It's intentionally verbose. Generic declarations are something most developers write much less often than they read. Certain kinds of verbosity can make code more readable, not less.

There's some languages I've seen - not naming names - which, at least to the outside observer, seem to operate on the principle that the more advanced a language feature is, the more cryptic the syntax can be. I think that's exactly backwards. Cryptic syntax is perhaps acceptable for optimizing very common things, but it should be avoided for expressing things that are uncommon or otherwise difficult to understand.

Perhaps that will work out. But I think that deep down I feel SQL and wordy languages are from the past not the future.

The thing is that as you start getting to languages that are as feature-rich as the ones we're looking at as potential Java replacements, you simply run out of punctuation characters on the keyboard. ;-)

So you have the choice between:

arbitrary cryptic combinations of symbols and and reuse of the same punctuation to mean totally different things in different contexts, or
a slightly more wordy style which I think ultimately is much easier for your brain to parse, especially if you're new to the language, or if it isn't the language you use every day.

What I'm hoping is that when people who are not Ceylon programmers see Ceylon code in a blog, they'll be able to get the gist of what it is doing. (Even if they don't immediately understand all the details.) There are languages like this (Python is a shining example), and languages which are not like this. That's not really a value judgement, just a statement about the goals of this particular language that we're working on.

More generally, and with Fantom in my mind, I'm not sure that trying to use the type system to this degree actually makes sense. Using the type system to prevent bugs is good, but using it to tie you down to absolutely precise inputs is increasingly something I see as unnecessary.

I've written Java code before Java had generics and it wasn't a nice experience. I'll never go back to a language without parametric polymorphism. That's not to say that Java's generics system is perfect. On the contrary, I get the feeling that they kinda panicked and rushed something ever so slightly half-baked into the language when they saw C# come out with this feature. But the addition of generics certainly improved the language, even given the imperfections.

Now, I suppose that my point of view on this mainly comes from the fact that I'm a totally IDE-oriented guy. I have never even once in my career written Java code without the help of an IDE. If you asked me to run javac from the commandline I would go racing off to google to figure out how to do that. And so I don't want my IDE to be crippled by not being able to properly analyze the types of things. Really, the whole reason we even have statically typed languages is to enable this kind of sophisticated tooling. A statically typed language without generics cripples that tooling.

Immutability

The distinctions between immutable and mutable, the variable annotation, and = and := look like a lot of rules to remember.

The rules are:

If you want to be able to assign a value to something more than once, you need to annotate it variable. It's the precise opposite of Java where you need to annotate something final if you don't want to be able to assign to it.
To assign to a variable, you use :=. Otherwise, you use =.

Like in ML, this is to warn you that the code is doing something side-effecty.

Concurrency

Also, concurrency does seem to be missing from the tutorial.

That's a job for the SDK and other libraries.

UPDATE

On one issue that Stephen and I do completely agree on, Stephen's latest post puts the case for prefix type annotations far more comprehensively than I did. (So as not to reopen the flamewar here in this site, I'll be deleting comments relating to this issue.)

In Relation To