Help

I'm the creator of Hibernate, a popular object/relational persistence solution for Java, and Seam, an application framework for enterprise Java. I've also contributed to the Java Community Process standards as Red Hat representative for the EJB and JPA specifications and as spec lead of the CDI specification. At Red Hat, I'm currently working on Ceylon, a new programming language for Java and JavaScript VMs.

I now blog at the Ceylon blog.

I also post stuff on G+.

Location: Barcelona, Spain
Occupation: Fellow at JBoss, a Division of Red Hat
Archive
My Books
Java Persistence with Hibernate
with Christian Bauer
November 2006
Manning Publications
841 pages (English), PDF ebook
Hibernate in Action
with Christian Bauer
August 2004
Manning Publications
408 pages (English), PDF ebook

This is a gorgeous, perfect example of precisely how not to engage in debate over technical issues. I won't respond point-by-point, because the post is mainly self-refuting, but it does amuse me to highlight the following:

  • Vague accusations of dishonesty in the title, which are never actually supported in the post itself.
  • The claim that projects which were announced to the community approximately three months ago are vaporware. (That's the fastest progression from announcement to vapor in history!)
  • The absurd accusation that people working an alternatives to Java have a secret ulterior motive to actually reinforce the status quo. Yes, seriously.
  • Further unsupported accusations of intellectual dishonestly, though it's not precisely clear on whose part.
  • Misattribution / invention of staw man arguments. (To the best of my knowledge none of the teams advocating alternative JVM languages have ever claimed that Scala is too academic or too functional or not functional enough.)
  • Insults directed at the entire Java programmer community.
  • No actual substantive technical arguments for or against any particular language.

The most ironic thing about this amazingly incivil flamebait post, is that after throwing around accusations of dishonestly and secret agendas, the poster (can't be bothered checking his name) writes:

If you plan to comment, keep in mind that your comment should be constructive and civilized.

Wow. I guess he at least deserves credit for chutzpah. :-)

(I'm closing comments since I can't be bothered responding to the inevitable trolls.)

P.S. Accusations of dishonesty, or of misinformation (which, in case English is your second language, means deliberate lying) have been a recurring theme lately. You guys need to put that one to rest. It's not reasonable to call someone a liar because you disagree with them.

A couple of days ago I spotted an interesting hole in Ceylon's type checker. I started out with something like this:

Sequence<Float>|Sequence<Integer> numbers = .... ;
Sequence<Numbers> numbers2 = numbers;

The Sequence interface is covariant in its type parameter. So, since Float and Integer are both subtypes of Number, both Sequence<Float> and Sequence<Integer> are subtypes of Sequence<Number>. Then the compiler correctly reasons that the union of the two types is also a subtype of Sequence<Number>. Fine. Clever compiler.

Now, here's the hole:

value first = numbers.first; //compiler error: member does not exist
value first2 = numbers2.first; //ok, infers type Number

When it encountered the union type, the member resolution algorithm was looking for a common produced type of the type constructor Sequence in the hierarchies of each member of the union type. But since Sequence<Float> isn't a subtype of Sequence<Integer> and Sequence<Integer> isn't a subtype of Sequence<Float>, it simply wasn't finding a common supertype. This resulted in the totally counterintuitive (and, in my view, pathological) result that the member resolution algorithm could not assign a type to the member first of the type Sequence<Float>|Sequence<Integer>, but it could assign a type after widening to the supertype Sequence<Number>.

Of course, there might be many potential common supertypes of a union type. There's no justification for the member resolution algorithm to pick Sequence<Number> in preference to a sequence of any other common supertype of Float and Integer. We've got an ambiguity.

So I quickly realized that I had an example which was breaking two of the basic principles of Ceylon's type system:

  • It is possible to assign a unique type to any expression without cheating and looking at where the expression appears and how it is used.
  • All types used or inferred internally by the compiler are denoteable. That is, they can all be expressed within the language itself.

These principles are specific to Ceylon, and other languages with generics and type inference don't follow them. But failing to adhere to them - and especially to the second principle - results in extremely confusing error messages (Java's horrid capture-of type errors, for example). These two principles are a major reason why I say that Ceylon's type system is simpler than some other languages with sophisticated static type systems: when you get a typing error, we promise you that it's an error that humans can understand.

Fortunately I was able to discover a useful relationship between union types and covariance.

If T is covariant in its type parameter, then T<U>|T<V> is a subtype of T<U|V> for any types U and V.

But furthermore, T<U|V> is also a subtype of any common supertype of T<U> and T<V> produced from the type constructor T. We've successfully eliminated the ambiguity!

So I adjusted the member resolution algorithm to make use of this relationship. Now the problematic code compiles correctly, and infers the correct type for first:

value first = numbers.first; //ok: infers type Float|Integer

Well, that's great! Oh, but what about types with contravariant type parameters? What type should be inferred for the parameter of the consume() method of Consumer<Float>|Consumer<Integer>? Well, I quickly realized that the corresponding relationship for contravariant type parameters is this one:

If T is contravariant in its type parameter, then T<U>|T<V> is a subtype of T<U&V> for any types U and V.

Where U&V is the intersection of the two types. So the type of the parameter of consume() would be Float&Integer, which is intuitively correct. (Of course, since it is impossible for any object to be assignable to both Float and Integer, the compiler could go even further and reduce this type to the bottom type.)

But, ooops, Ceylon doesn't yet have first-class intersection types, except as a todo in the language specification. And our second principle states that the compiler isn't allowed to infer or even think about types which can't be expressed in the language!

Well, really, I was just waiting for the excuse to justify introducing intersection types, and this gave me the ammunition I was waiting for. So yesterday I found a couple of free hours to implement experimental support for intersection types in the typechecker, and, hey, it turned out to be much easier than I expected. It's also a practically useful feature. I've often wanted to write a method which accepts any value which is assignable to two different types, without introducing a new type just to represent the intersection of the types.

I'll leave you with two more interesting relationships, applying to intersection types:

If T is covariant in its type parameter, then T<U>&T<V> is a supertype of T<U&V> for any types U and V.
If T is contravariant in its type parameter, then T<U>&T<V> is a supertype of T<U|V> for any types U and V.

Nice symmetries here.

02. Aug 2011, 19:26 CET, by Gavin King

In connection with this discussion it's worth making explicit what I guess everybody knows, but that sometimes seems to get a bit mixed up in conversation: that Java's support for raw types (necessary for backward compatibility with pre-generics code) doesn't really have anything much to do with type argument erasure. In a hypothetical language:

  • you could have raw types without type argument erasure, or
  • you could have type argument erasure without raw types.

Raw types have their own problems, of course (they're a designed-in hole in the type system). But a defense of the existence of raw types does not amount to a defense of partially reified types.

01. Aug 2011, 23:22 CET, by Gavin King

Cedric recently brought up the topic of type erasure, concluding:

All in all, I am pretty happy with erasure and I’m hoping that the future versions of Java will choose to prioritize different features that are more urgently needed

Well, I suppose erasure isn't the thing I hate most about Java, but it's certainly up there. Java's system of partially reified types actually adds a surprising amount of complexity and unintuitive behavior to the type system.

From a pure language-design point of view, I think a partially reified type system is one of the worst decisions you could possibly make. Either reify all types, like C#, or reify none of them, like ML. And look, there's certain language features that simply don't play nicely with type erasure. A case in point: overloading. You can have type erasure, or you can have overloading (or, like Ceylon, you can have neither). You can't have both type erasure and overloading. No, Java is not a counter-example to this! In terms of language design, Java's approach to reification is almost impossible to justify except as a totally half-baked and misconceived workaround for simply not having time to Do It Right.

But Cedric's coming from a purely practical point of view, saying the problems don't actually bite him much when he's doing real work. Well, OK, I can see that. So here's the practical reasons why I think reified generics are needed, and why they should be added to Java if that could be done without messing up Java's type system even further.

Frameworks

Many frameworks depend upon having reified types. Type argument erasure cripples frameworks that work with generic types, and results in horrid workarounds like this one in CDI.

Typesafe narrowing

Instead of a Java-style instanceof operator, and C-style typecasts, Ceylon provides a construct for narrowing the type of a reference in a totally statically typesafe way. You just can't get ClassCastExceptions in Ceylon.

But this functionality depends upon having reified generics. Until we implement reified type arguments, we can't provide any mechanism to narrow to a parameterized type. Right now, you simply can't narrow an Object to a List<String>.

You might think that this is a problem with Ceylon, but really, the situation isn't much better in Java. The instanceof operator doesn't support types with type arguments, and casting to a type with type arguments is a totally unsafe operation! I just don't think that's acceptable behavior in a language with static typing.

Inter-language interoperability

Interoperability between statically-typed JVM languages is going to get really messy when some of the languages support reified generics and some don't. Especially since it's easy to imagine that those languages which do support reified generics won't support them in an interoperable way. This could turn out to be a real problem for the vision of a multi-language JVM.

25. Jul 2011, 01:08 CET, by Gavin King

Over the last couple of days, I've exchanged a few emails with Stephen Colebourne regarding Ceylon, and some of the decisions we made in designing the syntax of Ceylon.

I believe that syntax is an extremely important part of language design. Developers work in teams. We spend all day reading each others code. We spend much more time reading code than we spend writing code. Therefore, languages should be designed to optimize the process of reading and understanding someone else's code. (Indeed, since I have such an atrocious memory, when I read code that I wrote more than a month ago, I may as well be reading someone else's code!)

Well, I thought some of Stephen's questions/criticisms, and my responses to them, might be of interest to a wider audience, so I asked his permission to clean up some of our exchange and publish it here. He's kindly agreed. Note that what follows is mostly my own words, and doesn't purport to do justice to Stephen's side of the argument, or by any means completely represent his views of the language. I've included mainly just the items where there is a clear choice between alternatives, and I can clearly express the reasons for taking the path we took.

I hope that this helps you guys understand why we made certain decisions, and some of the forces that operate on the language design process, forces that aren't always completely obvious when you look at the final shape of a language.

String interpolation

Why does Ceylon use "Hello, " name.uppercase "!" for string interpolation instead of the more familiar Hello, ${name.uppercase}!"?

We originally wanted to use the ${...} escape syntax, but it turns out that this can't be lexed using the regular expression-based lexer technology in ANTLR. We took a look at how Groovy handles this, and it seems like they wind up using a hand-coded lexer. We wanted to be sure that our language was easy to lex and parse, since that helps the compiler give meaningful feedback to the user about syntax errors.

The other thing we had in mind was that our primary motivation for having string interpolation in the language was not for writing everyday procedural code, but rather for use in defining templates (for example web pages) using the declarative object builder syntax. We think that the syntax we ended up settling on for string interpolation works out much better for this application, even if it can be slightly harder to read in typical procedural code.

Out, out, damn semicolon!

Why require semicolons at line end?

A number of recent languages use significant whitespace to eliminate the need for a ; statement terminator. Typically, I believe, this is implemented as some kind of auto-semicolon-insertion that happens in the lexer. So the actual formal grammar of the language, which produces the parser, still features required semicolons. You just don't have to actually type them.

Unfortunately, auto-semicolon-insertion doesn't play well with the annotation syntax we wanted to use. An annotation in Ceylon looks syntactically like an expression statement (because, in fact, semantically it is an expression). So there's no way for the parser, let alone the lexer, to distinguish an annotation sitting on its own line from an expression statement.

So languages which have both annotations and auto-semicolon-insertion need to introduce some ugly characters to distinguish annotations. The two things I've seen are @Annotation @OtherAnnotation, following Java, and [Annotation OtherAnnotation], following C#. But then the designers of these languages find this syntax so offensive that they can't actually bear to use it for their own annotation! So they find themselves having to introduce reserved-word modifiers like public, abstract, virtual, etc.

In Ceylon, modifiers are just ordinary annotations. They aren't keywords. You can have an attribute called shared or abstract or default. Basically, given the choice between requiring @ on all annotations, or ; on all statements, and a bunch of extra reserved words, we chose the ; as the lesser of two evils.

Syntax for control structures

The null checking syntax if (exists name) seems verbose. Wouldn't a symbol be better? What's wrong with if (name==null) like in Java?

Well, we wanted to avoid the extremely thorny problem of trying to define equality for null. In Java, the expression person.address==org.address evaluates to true if both person.address and org.address are null. This is almost certainly not what the programmer intends. In SQL, the equivalent expression evaluates to null, resulting in the whole somewhat doubtful machinery of ternary logic (which has some pretty unintuitive consequences).

Ceylon sidesteps this whole problem by simply not defining equality for the null value. The compiler won't let you write person.address==org.address if either person.address or org.address might be null.

Anyway, exists is actually less verbose than ==null. Yes, it's the same number of characters, but the more relevant measure from a readability perspective is the number of tokens. (Your eyes read tokens, not characters.)

Sure, I suppose we could have gone with something like if (name?), but then we would have had to come up with symbols for nonempty and is, which begins to make the language cryptic to people who don't use it every day.

I don't like if (nonempty seq) either. Sure its clear, but its over-verbose.

Again, neither if (!seq.empty) nor if (!seq is Empty) is significantly less verbose that if (nonempty seq). And I simply don't want to introduce some cryptic symbol, like, say ?? to mean non-empty.

The for loop is fine and readable. And the collections hierarchy and integration with the type system looks good. But for (Natural i -> String op in entries(operators)) seems quite complex. As does if (exists String op = operators[i]).

Actually we recently decided to let you simply eliminate type annotations from control structures, letting you write, for example:

for (name in names) { ... }
for (i->op in entries(operators)) { ... }
if (exists op = operators[i]) { ... }

This turns out to be more readable in real code. The tutorial doesn't yet reflect this change.

I'm struggling with if (is Hello obj). I think because the keyword is in the wrong place.

Yes, all these constructs look backwards. The reason we did it this way is to be regular with the full form which introduces a variable name:

if (exists first = seq.first) { ... }
if (is Ngo org = person.employer) { ... }
if (nonempty tags = form.tags.value.tokens) { ... }

But I definitely don't love the backwardness of it.

shared vs public/protected/private

I use protected all the time, and losing that seems dubious.

I disagree. To me, there is no objective software-engineering justification for ever choosing Java's protected over public. Visibility modifiers exist to control dependencies between independent units of software. It doesn't matter if a dependency comes from a subclass or from a client class. What matters is in what unit the subclass or client class is defined.

Ceylon's visibility model can be used to localize the visibility of a program element to a unit of any of the following levels of granularity:

  • the containing scope
  • the package
  • the module
  • all modules.

That's already more expressive than Java, since Java doesn't have the notion of modules, or module-private visibility. And Ceylon can express all this with just one annotation, instead of three!

Attributes

Attributes look like properties. So why the different name of the feature?

Well, to me the word property implies a getter or getter/setter pair. In Ceylon, not all attributes are getters. Attribute is a collective name to describe both property-style attributes and simple attributes. You can override a simple attribute with a property-style attribute and vice versa.

Type inference

Why not use val instead of local for type inference?

Actually, a few weeks ago, we finally made the decision to go with value for attributes/locals, and function for methods. That change isn't yet reflected in the tutorial. One reason (out of several) for doing this was that we might someday change our minds and let you use type inference for shared declarations. (Only if we can figure out how to do this without too much negative impact upon compiler performance.)

Syntax for inheritance

Why not use a colon instead of extends?

Well, it's partly a matter of taste. But the objective reasoning is that if you use : for extends, you then need to come up with punctuation that means satisfies, abstracts, adapts, of, etc, and you wind up in a rabbit hole of cryptic symbols like :>, <:, <%, etc. Imagine what it looks like to combine these together in the same type definition!

Using satisfies also seems another arbitrary change from Java.

Lots of people think that when they first see it, but actually it's not an arbitrary change. We did it this way for two reasons:

  1. So that upper bound type constraints can use a syntax that is regular with class and interface declarations. The keywords extends and implements simply don't sound right for defining an upper bound type constraint.
  2. Because the type that appears in the extends clause in a class definition comes with arguments to the superclass constructor. If we reused extends for interface extension like Java does, the grammar for the extends clause would be extremely irregular. And, on the other hand, implements doesn't sound right for interface inheritance.

The keyword satisfies reads correctly for all three cases: interface extension by another interface, interface implementation by a class, and upper bound constraints on a type parameter.

abstract/default/formal/actual

Why actual instead of the more common override?

The word override is a verb, and so it usually doesn't read well in a list of several annotations. Annotations read best together when they are all adjectives.

And the formal/abstract stuff gives the impression of being complex. I think its partly because it is a different model of working and partly because the keywords are unfamiliar.

It's really just the excellent overriding model used in C#, but we're using different annotation names.

We had to separate formal from abstract because they actually perform semantically different roles. Java and C# can get away with overloading the meaning of abstract, because an inner class can't be overridden. In Ceylon, where we have member class overriding, a formal class and an abstract class are semantically quite different creatures, and we need to be able to distinguish between them.

We could have used virtual like in C++ or C#, instead of default, but since Java doesn't have a modifier for this, and since we had already had to introduce the new terms actual and formal, I felt okay about choosing a name that I subjectively thought was clearer.

Union types

I'm not so enthusiastic about the use of union types underlying the typesafety of null values and empty sequences, but so long as they are rarely seen it is no big deal.

So it turns out that union types can be more transparent here than using an algebraic type like in functional languages. For example, you never need to instantiate wrappers. In Haskell, for example, you need to write Just 1 if you want to assign 1 to Maybe Int.

We do hide the union type under some syntax sugar, but that's definitely a leaky abstraction. Union types are an integral part of Ceylon's type system that anyone will need to get comfortable with if they want to be effective in the language. I think most people will come around to liking them.

The type inference section is where the union types start to appear useful.

Yeah, they even work nicely for generic type argument inference, which is not yet covered in the tutorial. One of the problems in Java's generics is that the compiler often infers types that are non-denotable, i.e. not representable within the Java language. This results in really confusing error messages. That never happens in Ceylon, since union types are denotable and the compiler never needs to infer any kind of existential type. This really is one of the nicest properties of our type system. An invisible cost of a more complex/powerful type system is more subtle/confusing error messages.

Introductions

Introductions look interesting, but I don't have an immediate similarity to map them to. As you rightly point out, implicits are a poor feature.

Introductions are like a compromise between two features you'll find in other languages. Extension methods are best known from C#, but actually have a long prior history. Implicit type conversions are featured in several languages including C++ and Scala.

Extension methods are a safe, convenient feature that let you add new members to a pre-existing type. Unfortunately, they don't give you the ability to introduce a new supertype to the type.

Implicit type conversions are a dangerous feature that, although they seem simple to understand, actually screw up several useful properties of the type system (including transitivity of assignability), introducing complexity into mechanisms like member resolution and type argument inference, and can easily be abused.

Introduction is a disciplined way to introduce a new supertype to an existing type, using a mechanism akin to extension methods, without the downsides of implicit type conversions.

Member classes and member objects

Member classes look quite clever and easy to understand at first glance. I can't quickly grasp why I'd want nested object instances.

Member objects turn out to be super-convenient once you actually start writing Ceylon code. I often use them instead of a nested class if the nested class would have no initializer parameters. They're a bit like a Java-style anonymous inner class, in that they give you a quick way to extend an existing type, while giving you access to state in the containing scope, but the syntax is a little cleaner, and an object may have multiple supertypes.

For example, a member object is a convenient way to implement the Iterator interface for an Iterable object.

Generics

So, my thoughts on the generics part is that there is a lot of power there with the type system performing better than Java. But the syntax is very verbose.

It's intentionally verbose. Generic declarations are something most developers write much less often than they read. Certain kinds of verbosity can make code more readable, not less.

There's some languages I've seen - not naming names - which, at least to the outside observer, seem to operate on the principle that the more advanced a language feature is, the more cryptic the syntax can be. I think that's exactly backwards. Cryptic syntax is perhaps acceptable for optimizing very common things, but it should be avoided for expressing things that are uncommon or otherwise difficult to understand.

Perhaps that will work out. But I think that deep down I feel SQL and wordy languages are from the past not the future.

The thing is that as you start getting to languages that are as feature-rich as the ones we're looking at as potential Java replacements, you simply run out of punctuation characters on the keyboard. ;-)

So you have the choice between:

  • arbitrary cryptic combinations of symbols and and reuse of the same punctuation to mean totally different things in different contexts, or
  • a slightly more wordy style which I think ultimately is much easier for your brain to parse, especially if you're new to the language, or if it isn't the language you use every day.

What I'm hoping is that when people who are not Ceylon programmers see Ceylon code in a blog, they'll be able to get the gist of what it is doing. (Even if they don't immediately understand all the details.) There are languages like this (Python is a shining example), and languages which are not like this. That's not really a value judgement, just a statement about the goals of this particular language that we're working on.

More generally, and with Fantom in my mind, I'm not sure that trying to use the type system to this degree actually makes sense. Using the type system to prevent bugs is good, but using it to tie you down to absolutely precise inputs is increasingly something I see as unnecessary.

I've written Java code before Java had generics and it wasn't a nice experience. I'll never go back to a language without parametric polymorphism. That's not to say that Java's generics system is perfect. On the contrary, I get the feeling that they kinda panicked and rushed something ever so slightly half-baked into the language when they saw C# come out with this feature. But the addition of generics certainly improved the language, even given the imperfections.

Now, I suppose that my point of view on this mainly comes from the fact that I'm a totally IDE-oriented guy. I have never even once in my career written Java code without the help of an IDE. If you asked me to run javac from the commandline I would go racing off to google to figure out how to do that. And so I don't want my IDE to be crippled by not being able to properly analyze the types of things. Really, the whole reason we even have statically typed languages is to enable this kind of sophisticated tooling. A statically typed language without generics cripples that tooling.

Immutability

The distinctions between immutable and mutable, the variable annotation, and = and := look like a lot of rules to remember.

The rules are:

  • If you want to be able to assign a value to something more than once, you need to annotate it variable. It's the precise opposite of Java where you need to annotate something final if you don't want to be able to assign to it.
  • To assign to a variable, you use :=. Otherwise, you use =.

Like in ML, this is to warn you that the code is doing something side-effecty.

Concurrency

Also, concurrency does seem to be missing from the tutorial.

That's a job for the SDK and other libraries.

UPDATE

On one issue that Stephen and I do completely agree on, Stephen's latest post puts the case for prefix type annotations far more comprehensively than I did. (So as not to reopen the flamewar here in this site, I'll be deleting comments relating to this issue.)

Showing 6 to 10 of 253 blog entries