Plugin literal formats

Posted by    |      

Ceylon is a language for defining structured data as well as regular procedural code. One of the first things you run into when defining data formats is the need for micro-languages - syntactic validation for character strings that represent literal values of some data type. For example:

  • email addresses
  • phone numbers
  • dates, times, and durations
  • regular expressions
  • cron expressions
  • URLs and URIs
  • hexadecimal numbers

For example, we would like to be able to write things like:

Date date = '25/03/2005';
Time time = '12:00 AM PST';
Boolean isEmail = '^\w+@((\w+)\.)+$'.matches(email);
Cron schedule = '0 0 23 ? * MON-FRI';
Color color = 'FF3B66';
Url url = 'http://jboss.org/ceylon';
mail.to:='gavin@hibernate.org';
PhoneNumber ph = '+1 (404) 129 3456';
Duration duration = '1h 30m';

And we want the compiler to be able to perform some kind of syntactic validation on the format of these character strings. Sometimes, this validation might be as simple as a regular expression. But in other cases, more complex syntactic validations are thinkable.

So in Ceylon we've reserved single quoted character strings for this usecase. What we have not yet figured out is how to handle the problem of determining what particular format a single-quoted literal adheres to (what type of literal it represents), and how to validate the literal against that format at compile time. Ceylon doesn't do left-to-right type inference, so we might end up needing to make you specify the type explicitly, for example:

Date date = Date '25/03/2005';
Time time = Time '12:00 AM PST';
Boolean isEmail = Regex '^\w+@((\w+)\.)+$'.matches(email);
Cron schedule = Cron '0 0 23 ? * MON-FRI';
Color color = Color 'FF3B66';
Url url = Url 'http://jboss.org/ceylon';
mail.to:=Email 'gavin@hibernate.org';
PhoneNumber ph = PhoneNumber '+1 (404) 129 3456';
Duration duration = Duration '1h 30m';

I don't think that's ideal, but it's probably the safest thing. As for validation, I can see two possibilities:

  • allow the application to supply a plugin validator, a Ceylon object that gets called at compile time, or
  • allow a type to specify its literal format using an annotation (which might specify a regex, or perhaps even some more powerful BNF).

These days, I'm leaning towards the second option:

class Color(format(Bnf '(`0`..`9`|`A`..`F`){6}') Quoted quoted) { ... }

One consequence of the support for quoted literals is that we might end up using backticks to quote single-character literals, for example: `A` or `\n`.

The truth is, some more thinking and experimentation is needed in this area.


Back to top