JAVACODE Productions Redux

Originally published at: https://javacc.com/2020/10/18/javacode-productions-redux/

About half a year ago, I wrote a post about ripping out so-called JAVACODE productions. I have to admit that I had almost forgotten about that whole issue (or maybe ersatz issue…) but just now Jan Monterrubio of the JSQLParser project asked me about this and I figured it would be better to answer the question here, since it is bound to come up again and I might as well have an answer that I can point people to, and even better, they can find easily on their own.

Legacy JavaCC has this thing called a JAVACODE "production" and (to answer Jan's question) JavaCC21 does not have this, because yes, I did rip out this "feature". However, this requires a bit more explanation, starting with the obvious question probably on the minds of most people who have come across this and don't even known what we're talking about. Namely:

WTF is a JAVACODE production?

A JAVACODE production is just a plain old Java method (hereafter POJM) that you can put anywhere in your grammar where you would normally put a BNF production -- a.k.a. a real grammatical production. You can also refer to it in your grammar as if it was a regular grammatical production. But, again, it is just a POJM.

In legacy JavaCC, you define a "JAVACODE production" by simply putting a Java method where you would normally put a regular grammar production and prefacing the POJM with the keyword JAVACODE.

Something like:

JAVACODE
void Bar() {...}

So, then you can use this Java method Bar elsewhere in your grammar as if it was a regular grammatical production, like:

Foo() Bar() Baz()

But, in reality, the above is exactly the same as writing:

Foo() {Bar();} Baz()

You see, again, Bar is just a POJM and in JavaCC (and in JavaCC21 as well, of course) you can put any block of Java code in an expansion in your grammar. So, one characterization of this state of affairs is that it is just syntactic sugar. The first snippet above is the same as the second one, but is a whole 3 characters shorter.

That, however, comes at a certain cost. It now means that anybody reading the grammar has to jump to where Bar is defined to see that it is actually a POJM masquerading as a grammatical production. With the second snippet above, it is readily obvious that Foo and Baz are regular grammatical productions and Bar is a java method.

Now, in the interests of absolute fairness, I would point out that the first snippet above does potentially offer another advantage besides the saving of 3 characters. However, it is really only an advantage in the case of legacy JavaCC. You see, if you simply made Bar() a regular Java method and defined it in the PARSER_BEGIN...PARSER_END section up top in your grammar, it might be defined quite far from where it is used. If it is defined as a JAVACODE production, on the other hand, it can be placed right next to the part of the grammar where it is used.

However, this advantage does not apply in JavaCC 21, since in JavaCC 21, you can place a code injection wherever you want in the grammar. Thus, for example, you can place an injection like the following:

INJECT PARSER_CLASS : {
    void Bar() {...}
}

anywhere you want in the grammar -- top, bottom, middle... -- while in JavaCC21, all such code must be in this PARSER_BEGIN...PARSER_END section right at the top of your grammar.

In general, this is a significant usability improvement in JavaCC21, because you can put code injections near to where they are used in the grammar. So, this was one advantage (albeit slight) that the JAVACODE productions offered in the context of legacy JavaCC, that is no longer even relevant in the context of JavaCC 21.

Now, getting back to Señor Monterrubio's question... he points to one place where the JSqlParser grammar uses this feature. and actually, the getOracleHint method is not even being use as a production, but just as a POJM. It is used once on line 1477 of that file, and is just being used from a java code block as if it was just any old Java method. (Which it is!) Another somewhat odd aspect of this example is that the JAVACODE "production", aside from never being used as a production, is defined 1000 lines from the only place it is used, so the other (albeit slight) advantage of using the JAVACODE "feature" is not present.

Regardless, the only change needed to make this work is to replace:

 JAVACODE
 OracleHint getOracleHint() {
     ....
 }

by:

 INJECT PARSER_CLASS : {
     OracleHint getOracleHint() {
        ...
     }
 }

and it should work as before. Actually, there are two other JAVACODE productions in the grammar Jan points to. One is a java method called error_skipto which, like getOracleHint is only ever used as a POJM. The other JAVACODE production in the file is called captureRest and actually is used as (fake) grammar production on line 5101, like so:

 tokens=captureRest()

However, the only thing needed to get this working with JavaCC21 would be to rewrite this as:

 {tokens=captureRest();}

Concluding Remarks

I hope the above explains what this question of ripping out the JAVACODE productions is about. The whole "issue" is really much ado about nothing because this whole JAVACODE "feature" is not much of a feature. (To put it in more vulgar terms: the whole thing is about as useful as a nun's fill-in-the-blank.)

Not only is the whole JAVACODE concept not useful, it's confusing, because Jan is not the first person who seems to think that this is an actual feature that... like... does something...

Well, maybe that is my fault. I did use this violent language of ripping out, which might make one think that something of some real substance was being extirpated. But no. Again, a JAVACODE production is just a Java method -- a POJM that can masquerade as a grammatical production, i.e you can reference it as if it was a regular grammatical production. What this means concretely is that you can, at some points write:

 Foo()

instead of:

 {Foo();}

a slight syntactical convenience at the cost of making it unclear to anybody reading the code that Foo is just a POJM. (So, unlike the nun's you-know-what, it is not just useless, but arguably harmful!)

In any case, given that JavaCC21 allows you to place a code injection anywhere in a grammar that you want, the only advantage that the whole JAVACODE production concept ever had no longer is present. And since it creates confusion and is not that widely used anyway, and when it is used, it is not hard to update one's code...

It is now gone!

I hope the above explanation is clear. But if it isn't, please do not hesitate to write some question or comment on the forum. That is what it is there for!