JDK 14 Now Fully Supported! New Switch Syntax

Originally published at: https://javacc.com/2020/10/25/jdk-14-now-fully-supported-new-switch-syntax/

When I say "fully supported", I mean all the new features that are categorized as "stable". In fact, the only new language feature that has been promoted to stable in JDK 14 is the new syntax for the Switch statement. This also allows a Switch statement to be used as an expression.

In fact, the built-in Java parser can now parse all the Java source in the src.zip that is included in both JDK 14 and JDK 15. But that is because none of those files actually use the new text block feature (multi-line string literals delineated by triple quote characters) that was promoted to a stable feature in JDK 15. I'll add support for that when I get round to it, maybe when there hasn't been much to announce for a while, I'll nail the triple-quote string literals and announce that we now support Java 15!

Context Sensitive Parsing Redux: The Yield Statement

The main way that the new Switch syntax allows a Switch construct to return a value is by means of a new yield statement. The syntax of the yield statement is dead simple. In BNF notation, it is simply:

 "yield" Expression ";"

There is a novel twist though because "yield" is not actually a new keyword introduced into Java. In most contexts, "yield" is still just a regular Java identifier, so you can write:

 int yield = 4;

or something like that. An interesting twist on this is that:

 yield yield;

is actually a valid statement -- at least, assuming you are at the right spot in a switch expression. The first "yield" is the "yield" instruction and the second one is just a plain identifier! (Can you say Dmitry Dmitriyevich?)

To be absolutely honest, I'm not a great fan of this new language feature, but it is part of the standard Java language, and I feel I have to support it. (It was not theirs to question why...)

But the other aspect of this is that this sort of context-sensitive parsing problem ("yield" being a keyword, kinda, sorta, some of the time, but actually being a regular identifier most of the time) is actually a pretty nice showcase for some of the new features in JavaCC21. It turns out that the whole thing can actually be expressed quite elegantly. Here is the definition of the yield statement in the current Java grammar used now internally:

YieldStatement :
    <IDENTIFIER>
    [
       SCAN {!getToken(0).getImage().equals("yield")}#
             ~\...\SwitchExpression
       => FAIL 
    ]
    Expression
    =>||
    ";" 
;

Interestingly enough, given the above definition, the YieldStatement production can just be used like all the other various XXXStatement productions without being qualified by any lookahead specification. There are two key points to understand here. The first is that the up-to-here marker on the second-last line says that when this YieldStatement occurs at a choice point we scan ahead to this point to decide whether to enter the production.

The second key point to understand is that if we hit the FAIL statement in a lookahead routine, this is interpreted as a failure (duh!) and means effectively that if the current token (the one that lexically matched IDENTIFIER) is not the string "yield", then the lookahead has failed. Also, if we are not in a SwitchExpression, the lookahead fails. (By the way, the # after the semantic lookahead indicates that the predicate applies globally, i.e. both in a lookahead and in a regular parsing routine.)

In short, some of these recently introduced features fall together such that we can express certain things in a fairly clean and robust way.

As for the rest of the new switch statement syntax, I implemented it in a rather straightforward, even bloody minded way. One rather annoying thing I ran into was that in the following sort of construct, the way I had written the grammar, a construct like :

  case FOO, BAR -> {...}

the machinery wanted to interpret BAR->... as a Lambda. However, simply adding a lookbehind predicate at a key point solved the problem. The production UnaryExpressionNotPlusMinus is now:

UnaryExpressionNotPlusMinus :
   ( "~" | "!" ) UnaryExpression
   |
   SCAN ~\...\NewSwitchLabel
   => LambdaExpression 
   |
   => CastExpression
   |
   PostfixExpression
   |
   SwitchExpression
;

Of course, there is an extra choice added at the end, but also the SCAN ~\...\NewSwitchLabel lookbehind predicate was prefixed to the LambdaExpression choice, i.e. if we are in the new-style switch label, we do not parse Foo->... as a lambda! And that little tweak seems to have solved that problem. Once I had that, the Java grammar could parse all the JDK 14 and 15 source code. Hurrah!

Again, as I've said in previous announcements of this nature, the included Java grammar can be freely used in your own projects. I guess it would be nice if you put in an acknowledgment in your product that you use it, but I'm not demanding even that. Take it, use it, be happy...