Straightforward LOOKAHEAD Enhancements

Originally published at: https://javacc.com/2020/04/23/straightforward-lookahead-enhancements/

A couple of weeks ago, I implemented a solution to a longstanding Dmitry Dmitriyevich problem with LOOKAHEAD specifications in JavaCC. Dmitry Dmitriyevich (alternatively Ivan Ivanovich or Vladimir Vladimirovich) is my own personal terminology for situations where you are required to write highly repetitive code. (Patriotic Irishmen are welcome to call this the Patrick Fitzpatrick or Gerald Fitzgerald issue. By all means! Be my guest!) Anyway, getting back to JavaCC, or legacy JavaCC to be more precise, legacy JavaCC grammars are typically full of statements like:

LOOKAHEAD(Foo()) Foo()

and even:

LOOKAHEAD(Foo() [Bar()] Baz()) Foo() [Bar()] Baz()

In other words, we scan ahead to see whether a Foo() production would succeed and then, assuming that is the case, we parse Foo(). In JavaCC 21 the above can be written simply as:

LOOKAHEAD Foo()

and

LOOKAHEAD Foo() [Bar()] Baz()

respectively.

In fact, a shorter synonym is available, which is SCAN, so you can just write:

SCAN Foo()

and the internal machinery deduces that it should put a lookahead routine to check whether the expansion in Foo() succeeds.

Actually, some of these minor enhancements to LOOKAHEAD were so tempting to use that I could not resist the temptation to rebootstrap the JavaCC build, i.e. use a newer jarfile for the internal build so that this feature could then be used internally. Thus, for example, you see that I have this code internally where before it looked like this. Actually those verbose Gerald Fitzgerald constructs are now replaced in a dozen different places in the included Java grammar and I think these kinds of things, even if minor, are a clear win in readability.

Now, another enhancement is allowing negative LOOKAHEAD. This is because, broadly speaking, there are occasions where it is easier to describe something by what it is not than what it is. (If it ain’t yellow, it’s brown.) Thus, the following construct is now available:

LOOKAHEAD(~Foo()) Bar()

This means that we can scan ahead for a Foo() production but if it is not there, then we assume that we have to parse a Bar(). This can be used with any expansion you could place in a LOOKAHEAD. You can see this in action here.

So, on this line we currently have:

LOOKAHEAD(~(<BIT_OR>|<COMMA>|<RPAREN>|<RBRACE>|<RBRACKET>))

where what was there before was:

LOOKAHEAD(0, {notTailOfExpansionUnit()})

where notTailOfExpansionUnit() is a Java method defined elsewhere in the file. The first code is much better for a couple of reasons. First of all, it does not resort to Java code, so any future version of the tool that generates code in other languages would presumably still be able to generate code for this construct. Also, anybody reading this can see exactly what the LOOKAHEAD does without having to go off to some other place to see where the method is defined.

In closing, these are just minor enhancements, a bit of low-hanging fruit that was plucked, that should have the result of making JavaCC grammars more pleasant to write, and certainly clearer to read.