JDK 16 is out!

Though JDK 16 was released on 16 March, a week ago, I only became aware of that a few days later on March 20. Once I saw that news item, I figured I would just immediately add support for the new JDK 16 language features to JavaCC. It didn't take me very long, maybe a few hours. You see, though there are a lot of new features in Java 16, it turns out that only of two of them are pertinent from a pure language syntax perspective:

The first turned out to be quite trivial. That is implemented here.

The new record feature was a little bit trickier to implement because the word "record", like "yield" or "var" is a restricted keyword, i.e. a keyword in certain contexts, but outside of those cases, it is just a regular identifier.

Well, the long and short of it really is that these restricted keywords can be variable names, but not type names. Of course, the whole concept of the restricted keyword only exists for the purpose of backward compatibility. Surely, if all of this was designed from scratch, record would just be a regular keyword, like class or interface. But all of this has led to some rather strange corners of the syntax. For example, the following is a perfectly valid statement:

yield yield;

No biggie. The first yield is the keyword and the second one is a regular identifier. Or, for example, this statement is also valid:

var var = var();

However, interestingly enough, the following is not permissible:

record record ...

Since record is a restricted keyword it cannot be used as a type name. So class record... or interface record... is also a no-no.

Well, all of this is quite funky really, but is actually a real blessing for JavaCC development, since these corners in the syntax constitute a somewhat challenging testbed for the capabilities of a parser generator. I described earlier how I went about supporting the yield statement. Actually, since writing that post, I did it a bit differently and now this is how it is implemented:

YieldStatement : 
   SCAN {getToken(1).getImage().equals("yield") 
         && isInProduction("SwitchExpression")}#
   => <IDENTIFIER> Expression ";"
;

Well, basically, it seemed like more bother than it was worth to define a separate keyword yield, so the way I dealt with the yield statement (part of the newer switch expressions that is a stable feature since JDK 14) was simply to use semantic lookahead. Well, the upshot is that, in the lexical grammar, there is no yield token. I deal with this ad hoc as you see above. But I took a different approach with record, defining it as a keyword, but then creating a TOKEN_HOOK method that replaces the keyword with an identifier if we are not in a context where record is a keyword. You can see that here but I'll reproduce the key part here:

INJECT PARSER_CLASS : {
  private Token TOKEN_HOOK(Token tok) {
    TokenType type = tok.getType();
    if (type != RECORD && (type != IDENTIFIER || !tok.getImage().equals("record"))) {
       return tok;
    }
    TokenType desiredType = inTypeDeclaration() ? RECORD : IDENTIFIER;
    if (type == desiredType) return tok;
    Token result = Token.newToken(desiredType, "record", tok.getInputSource());
    result.copyLocationInfo(tok);
    return result;
  }
}

Basically, the above token hook method expresses the idea that if we're not in a type declaration, the record token should really be a plain identifier, so our token hook method replaces it with one. A side effect of this way of implementing things is that something like:

 class record {...}

does not parse because we are in a type declaration so record is treated as a keyword, but the grammar rule specifies that the token after class must be an identifier. However, the statement:

 int record = 7;

parses okay, since, because we are not directly in a type declaration, the record token was replaced by an Identifier, so the above statement parses. Well, my point is that the generated parser does the right thing (record can be a variable or method name, but not a class name!) with these constructs because of the little touch of magic provided by the relatively terse TOKEN_HOOK method above. What this means is that there is no need to muddy up the syntactic grammar specifying these gnarly details. All of this does rely on some enhancements added to JavaCC 21 over the last few months. Such an elegant solution to these problems is impossible using the legacy JavaCC.

It appears that JDK 17 is due out in September of this year. New Java releases are coming twice a year, I guess. I don't actually know what new language features are in store for JDK 17, but judging from the past few cycles, it doesn't seem that it is very much work to commit to staying up-to-date with Java language evolution. The last 3 release, 14, 15 and 16 brought us the switch expressions, multi-line string literals, and the two new features I mention above and I do not believe I spent very much time adding support for those four features. Actually, it should get easier over time, since JavaCC 21 (like Java itself!) is not standing still, but getting more powerful and expressive, so supporting new syntactical constructs just gets ever easier.

In closing, I'll repeat the key points: first of all, the Java grammar I describe is the one used internally by JavaCC 21, so all of these newer syntactical elements can be used in code actions and injections within your grammars. Also, the grammar is free to use in your own projects. So, yeah, go shout it from the rooftops. Actually, the more modern thing to do apparently is to tweet it or like it on Facebook. I recently added these social media buttons for you to do that easily, and I would greatly appreciate the help getting the word out.

Post Views: 5,301