Fixed a Longstanding Bug Known Issue in JavaCC: Nested syntactic lookahead now works!
There is a quite serious bug (let’s just call it by what it is, shall we?) of very long standing in JavaCC. Basically, syntactic lookaheads do not nest. This bug is documented in Theodore Norvell’s JavaCC FAQ, which by the way, is the best documentation for JavaCC – at least that is available for free. It is question 4.8 in Professor Norvell’s FAQ:
Are nested syntactic lookahead specifications evaluated during syntactic lookahead?
In a word, the answer to the question is no and Theo gives a concrete example, that serves as a nice little testcase for our purposes here. I’ve simplified it a little bit, since there is no need for the trivial productions w()
, x()
, and y()
. Here it is:
PARSER_BEGIN(TestParser)
import java.io.*;
public class TestParser {
static public void main(String[] args) throws Exception {
TestParser parser = new TestParser(new StringReader("wxy"));
parser.start();
}
}
PARSER_END(TestParser)
void start( ) : { }
{
LOOKAHEAD ( a() )
a() {System.out.println("Nested lookahead successful!");}
< EOF >
|
"w"
<EOF>
}
void a( ) : { }
{
(
LOOKAHEAD ( "w" "y" ) "w"
|
"w" "x"
)
"y"
}
Since the above example uses entirely legacy syntax, it builds with either legacy JavaCC or JavaCC21. Just drop the file (Test.jj
or whatever you want to call it) in an empty directory and execute the following commands:
$ javacc Test.jj
$ javac *.java
$ java TestParser
If you use legacy JavaCC in the above, it fails just as Theo describes in his FAQ. I just tried it with JavaCC 7.0.10 and it results in this:
Exception in thread "main" ParseException: Encountered " "x" "x "" at line 1, column 2.
Was expecting:
<EOF>
at TestParser.generateParseException(TestParser.java:377)
at TestParser.jj_consume_token(TestParser.java:243)
at TestParser.start(TestParser.java:19)
at TestParser.main(TestParser.java:7)
If you do the above steps with the latest version of JavaCC21, it gives you this output:
Nested lookahead successful!
Yes, JavaCC 21 evaluates nested syntactic lookahead! So, I guess that Theo, or whoever is maintaining the FAQ nowadays, really ought to replace the answer to question 4.8. The answer should change from:
No.
to:
If you are using *legacy JavaCC*, no, nested
syntactic lookahead is not evaluated. BUT if
you are using the updated version, JavaCC 21,
syntactic lookahead is evaluated.
(*Hallelujah!*)
Tangent: When does a bug becomes a known issue?
I reckon that, once a bug reaches a certain age in a well known software tool, it graduates from being a bug to a known issue. This particular bug known issue in JavaCC is about 24 years old. It always worked this way and nobody ever fixed it. Granted, when Professor Norvell wrote the FAQ entry on that, the bug was perhaps (just guessing…) only 10 years old. It was still an issue that had existed from the very beginning with JavaCC and there was absolutely zero prospect (or so it seemed) of anybody ever fixing this.
What I found noteworthy about all of this is the way Professor Norvell approaches this. At no point in the FAQ entry does he say straightforwardly. "This is a pretty major bug and somebody really ought to fix this!
No, he simply documents the behavior (as if it were completely normal!) and provides some possible workarounds for various cases where this is a problem. But he certainly does not use the dreaded B-word in describing the situation. Actually, I noted that the word “bug” only occurs twice in the JavaCC FAQ, once in answer to FAQ 1.7 where he says:
If you found a bug in JavaCC, please open an issue.
If you found a bug…
Well, gee whiz Theo, what about the fact that nested syntactic lookahead doesn’t work? Duhhh…
Well, granted, if he (or anybody) did “open an issue”, the ostensible maintainers of the legacy JavaCC project would surely respond:
You see.... this is a known issue.
(Right? Been there… Done that, eh?)
You see, this is actually a key rhetorical trick, a feature of the sociocultural phenomenon that I call nothingburgerism. Once you call something that is obviously a bug a known issue, and that “issue” is documented, then it’s no longer a bug, since the software is behaving exactly as documented.
See the sleight of hand there?
Well, over the last few months, I’ve thought a lot about nothingburgerism and the way people seem to enable and foster it. The above-mentioned FAQ entry is an example of this. The FAQ maintainer, Professor Norvell, surely knows that this is a severe bug in JavaCC, but he makes the decision (consciously or not, but I suspect it’s not even conscious) never to refer to it simply as a bug that needs to be fixed. He simply documents this screwy behavior, offers some convoluted workarounds to what is obviously a bug, but you can be sure that he would never call out the ostensible maintainers of the project for not having fixed such an obvious bug for so many years! So...
If you find a bug in JavaCC, report it to Homeland Security!
If you see anythung suspicious at the airport, report it to the JavaCC devs...
Well, this is not about Professor Norvell in particular, mind you. It is really part of a much much larger cultural phenomenon, where people won’t tell the truth straightforwardly about things. To say that this is just a bug that ought to be fixed would just be too raw and it might hurt somebody’s feelings…
This is understandable, and we have all (including me even) declined to tell the full truth about certain things because it might offend somebody. However, I would point out that this is precisely what makes a phenomenon like nothingburgerism possible.
Food for thought…
Pingback: Turning to Semantic Lookahead – JavacCC for the 21st Century
Pingback: New Feature: FAIL Statement – JavacCC for the 21st Century
Pingback: Close Encounters of the Apache Kind – FreeMarker Template Engine
Pingback: A Bug’s Life – JavaCC 21
Pingback: Three Cheers for Norbert Sudhaus! – JavaCC 21
Pingback: Is All this Parsing theory just bullshit or what? – JavaCC 21
Pingback: Nested Lookahead Redux – JavaCC 21