Some Minor Enhancements

Originally published at:

Over the last days, I announced certain new features but then realized that they really needed certain key enhancements. I refer to the new FAIL statement and the up-to-here notation, the latter of which I only announced three days ago!

The FAIL statement

On first implementation, the FAIL statement allowed you to write something like:

 FAIL "some message"

And this would be a shorthand for:

{throw new ParseException("some message");}

Well, except that the constructor to ParseException likely has some other parameters that allow the machinery to generate a better error trace.

However, the other important aspect of this new construct is that it is treated as a failure in a scanahead routine. So, if you have a production like:

FooBar : 
    FAIL "Was expecting a foo or bar here!"

Then, if you have a lookahead, like:

 SCAN FooBar => 

In legacy JavaCC, the above sort of construct is always considered to be "successful". In JavaCC 21, the above lookahead would return true if the next token is a "foo" or a "bar" but otherwise, i.e. if you hit the FAIL statement, the lookahead routine returns false.

This finally struck me as such a key concept that I extended the FAIL statement to take an arbitrary block of Java code, so you can write:

 FAIL {some kludge maybe here}

But again, a LOOKAHEAD(FooBar) would return true if you hit "foo" or "bar" and false otherwise. In all other respects, the semantics would be the same as:

 {some kludge here}

which in either legacy JavaCC or JavaCC 21 is always considered "successful".

Extending the up-to-here feature

On trying to use it, I realized that the new up-to-here feature has a certain limitation that makes it less useful than it might otherwise be and I have now remedied that.. (I think...)

Suppose you have a production like:

MethodDeclaration :
     ReturnType MethodName Parameters Block;

where the Block is a block of Java code. It would be fairly typical that you want to scan ahead past the first three non-terminals AND check that the next character is the "{" token that begins the Block.

There is now a notation to handle this case.

 MethodDeclaration :
      ReturnType MethodName Parameters =>|+1 Block;

You can write => followed by | followed by a + and a single digit. So, for example,

     Foo Bar =>|+3 Baz

would scan past Foo and Bar and then at that point, limit the remaining lookahead to at most 3 more tokens. Probably in most cases, the use of this is with productions where the final non-terminal is arbitrarily long and you really don't want to scan through it all.

Note, of course, that

  Foo Bar Baz =>|+0

is exactly the same as:

  Foo Bar Baz =>||

I thought of disallowing the +0 above and just accepting 1-9 after the plus sign. But then I figured I might as well let people write the +0 if they want. As of the current implementation, you can only have a single digit after the plus sign, so the most further you could specify is 9 more tokens. That is artificial and I may allow two digits after the plus sign later, though I wonder if there is much real-world use case for more than 9 tokens. As a practical question, most of the uses of this will be with a +1. You scan up to some arbitrarily long construct that starts with an opening brace, let's say, so you scan up to where that starts and then +1.

I realized the other day that I really was feeling an urge to actually sit down and write some grammars -- like maybe for C# or Javascript or PHP.... I dunno....

With all these new features that enhance usability, I think that writing grammars could be fun. (Or almost...)

Well, just as a well-prepared and presented dish makes you want to eat it, the latest JavaCC is increasingly something that makes you want to use it! (At least I feel that way!)