The new SCAN construct

Originally published at: https://javacc.com/2020/07/24/the-new-scan-construct/

A few months ago, I started re-examining some of the legacy JavaCC syntax. I realized the pure absurdity of having to write:

LOOKAHEAD(Foo() Bar() Baz()) Foo() Bar() Baz()

In short order, I allowed people to write the above as:

LOOKAHEAD Foo() Bar() Baz()

But this started me thinking…​. I reasoned that there were doubtless plenty of other places to streamline the syntax. I also reasoned that this would already, in itself, make the tool far more amenable to users. The result (in its current state) is described (mostly) here.

Again (and I anticipate saying it many times) the legacy syntax still works! However I do not anticipate many people opting to use it after they get used to the newer syntax. Also, it is entirely possible that, once I have a reliable conversion utility to convert the older syntax, I will deprecate the legacy syntax, and then eventually (giving people lots of time to convert their grammars) simply remove support for it. But that is a long way off…​

So, here are the basic differences between SCAN and the legacy LOOKAHEAD:

The parameters of SCAN (assuming there are any) are not enclosed in parentheses.

Thus, where you previously wrote:

LOOKAHEAD(3) Foo()

you would now write:

SCAN 3 Foo

Where you previously wrote things like:

LOOKAHEAD(Foo()) Foo() Bar()

you now write:

SCAN Foo => Foo Bar

Note that the arrow is necessary in the above to separate the lookahead expansion from the expansion to be parsed. Note that the first example above could also be written:

SCAN 3 => Foo

but the arrow is optional because it is not necessary to disambiguate anything. People might want to put in the arrow in those cases

The parameters of SCAN (assuming there are more than one) are not separated by a comma.

Thus, where you previously wrote:

LOOKAHEAD(Foo(), {someCondition()}) FooBar()

you now write:

SCAN {someCondition()} Foo => FooBar

(Note that the order was changed because in actual code, the semantic lookahead is actually evaluated first. At least in my implementation!)

Where you would previously write:

LOOKAHEAD(1, {someCondition}) Foo()

you now write:

SCAN 1 {someCondition} => Foo

A little point to note is that the new SCAN construct assumes indefinite scanahead by default. Thus, with the legacy LOOKAHEAD,

LOOKAHEAD({someCondition}) Foo()

is the equivalent of writing:

SCAN 0 {someCondition} => Foo

In the legacy lookahead, if you only have semantic lookahead, the number of tokens to be scanned is assumed to be zero, i.e. if the condition succeeds, you automatically go into the following expansion.

On consideration, I didn’t like that, and decided that, with an explicit SCAN, the lookahead limit is always infinite, unless you specify otherwise. This just strikes me as more logically consistent, following the principle of least surprise. But also, considerations like this impelled me to create a new SCAN construct, since I could change the semantics to be more intuitive. (At least IMHO.)

The SCAN construct also allows the newer LOOKBEHIND construct.

This is outlined separately.

Addendum

Even the newer SCAN construct has some redundancies. There is no obvious reason to oblige anybody write even:

SCAN 3 Foobar

We could permit:

3 Foobar

It is trivial to allow this, but I have refrained for now, since I think there is a balance to be drawn between overly verbose constructs and the overly cryptic. I think that legacy JavaCC syntax, by and large, is overly verbose, and there has been a clear need to streamline it, but I don’t want to go too far. It is quite likely that I will go for an in-between option of allowing:

3 => Foobar

But again, that is not currently available. (But I’m definitely thinking about it!) As of this writing, you can write:

SCAN => Foobar
or:
SCAN Foobar

In general, even though “SCAN” is a lot shorter and less of a mouthful than the older “LOOKAHEAD”, the possibility of making even the word SCAN optional is quite clear in some cases, and worth considering. What was previously written as:

LOOKAHEAD(Foo()) Foo()

can now be written as:

SCAN => Foo
or:
SCAN Foo

Again, it is quite feasible to allow simply:

=> Foo

In this case, because I think it is so common, I decided to allow this. And, in fact, you can see that I use it in internal development, for example here.

In general, I have now converted internal JavaCC development to use the newer streamlined syntax, including the new SCAN construct throughout. So, you can see what this looks like in a couple of big (and sometimes crufty) real-life grammars here or here.

This, by the way, is my approach to testing new features. I simply use them!