Over the last days, I announced certain new features but then realized that they really needed certain key enhancements. I refer to the new FAIL statement and the up-to-here notation, the latter of which I only announced three days ago!
The FAIL statement
On first implementation, the FAIL
statement allowed you to write something like:
FAIL "some message"
And this would be a shorthand for:
{throw new ParseException("some message");}
Well, except that the constructor to ParseException
likely has some other parameters that allow the machinery to generate a better error trace.
However, the other important aspect of this new construct is that it is treated as a failure in a scanahead routine. So, if you have a production like:
FooBar :
"foo"
|
"bar"
|
FAIL "Was expecting a foo or bar here!"
;
Then, if you have a lookahead, like:
SCAN FooBar =>
In legacy JavaCC, the above sort of construct is always considered to be "successful". In JavaCC 21, the above lookahead would return true
if the next token is a "foo" or a "bar" but otherwise, i.e. if you hit the FAIL statement, the lookahead routine returns false
.
This finally struck me as such a key concept that I extended the FAIL
statement to take an arbitrary block of Java code, so you can write:
"foo"
|
"bar"
|
FAIL {some kludge maybe here}
But again, a LOOKAHEAD(FooBar)
would return true
if you hit "foo" or "bar" and false
otherwise. In all other respects, the semantics would be the same as:
"foo"
|
"bar"
|
{some kludge here}
which in either legacy JavaCC or JavaCC 21 is always considered "successful".
Extending the up-to-here feature
On trying to use it, I realized that the new up-to-here feature has a certain limitation that makes it less useful than it might otherwise be and I have now remedied that.. (I think...)
Suppose you have a production like:
MethodDeclaration :
ReturnType MethodName Parameters Block;
where the Block is a block of Java code. It would be fairly typical that you want to scan ahead past the first three non-terminals AND check that the next character is the "{" token that begins the Block.
There is now a notation to handle this case.
MethodDeclaration :
ReturnType MethodName Parameters =>|+1 Block;
You can write =>
followed by |
followed by a +
and a single digit. So, for example,
Foo Bar =>|+3 Baz
would scan past Foo
and Bar
and then at that point, limit the remaining lookahead to at most 3 more tokens. Probably in most cases, the use of this is with productions where the final non-terminal is arbitrarily long and you really don't want to scan through it all.
Note, of course, that
Foo Bar Baz =>|+0
is exactly the same as:
Foo Bar Baz =>||
I thought of disallowing the +0 above and just accepting 1-9 after the plus sign. But then I figured I might as well let people write the +0
if they want. As of the current implementation, you can only have a single digit after the plus sign, so the most further you could specify is 9 more tokens. That is artificial and I may allow two digits after the plus sign later, though I wonder if there is much real-world use case for more than 9 tokens. As a practical question, most of the uses of this will be with a +1. You scan up to some arbitrarily long construct that starts with an opening brace, let's say, so you scan up to where that starts and then +1.
I realized the other day that I really was feeling an urge to actually sit down and write some grammars -- like maybe for C# or Javascript or PHP.... I dunno....
With all these new features that enhance usability, I think that writing grammars could be fun. (Or almost...)
Well, just as a well-prepared and presented dish makes you want to eat it, the latest JavaCC is increasingly something that makes you want to use it! (At least I feel that way!)