Include source code grammar fragments or call sub-parsers?

About the INCLUDE feature. Well, that’s better than nothing, but some people (that will recognise themselves…) sometimes prefer nothing than something…
Being able to include grammar fragments can bring some benefits. But IMHO this is not a 21st century must have.

Java designers decided not to provide an include feature (even no preprocessor feature) in the java language and explained why.
Sonar Cobol rules flag copy books containing code as weaknesses.
Because source code that compiles into multiple binaries causes maintenance problems.
A grammar fragment F is included by two different people in two parsers G and H. They write 2 tools with G and H, say I and J. They write different additional classes / methods related to F in I and J. One wants to update F for G&I. H&J get problems.
Including a Java grammar in the JavaCC grammar, OK, but this Java grammar will probably not be resused elsewhere except by duplicating it.
Let’s take another case: Oracle SQLPlus, SQL and PL/SQL. In PL/SQL you can write pure SQL code. In SQLPlus you can write SQL code and PL/SQL code, along with the SQLPlus code. When you develop tools for these grammars, one good way is to develop visitors on the generated AST (JJTree or JTB); if you include the SQL grammar in the 3 tools, you need to take care of the 3 sets of visitors. Same for the PL/SQL in the 2 tools.

So a 21st century feature should be to be able to simply call a parser within a grammar, that is call a method of a class. Something like SQLParser.Expression() anywhere in the grammar where a BNF production can be referenced to reference the Expression() production in the SQLParser grammar. Or something like Parser#1.Expression() where Parser#1 is a symbolic name defined in a option.
Note that SQLParser.Expression() should generate the call to this named method (not include the source code), but should also generate the glue for linking the tokenmanagers and the node trees. The tokenmanagers could be either shared by the calling and called parser, or different ones. The (single) tree, built initially by the caller, must have a branch that has been built by the called parser. Exceptions must also be managed.
Same, on returning from the called parser, the tokenmanager and the node tree should resume correctly.

Okay, where to begin? On the one hand, you make a quite valid point, which is that the current INCLUDE directive is quite crude and bloody-minded. As a code re-use mechanism, one could envisage quite a few refinements.

On the other hand, you veer into some self-evident silliness, this stuff about the Java language not having something like INCLUDE.

Marc, I’m sorry. The above is just silliness. You’re implicitly comparing two completely different situations. The Java language is an object-oriented language that has all the typical dispositions of an OOP language. It has inheritance obviously, and it has composition. So, if you are working on Foo class which is defined in the file, you can reuse the code you defined in the class by having Foo extend Bar. OR alternatively, you can have a Bar object as a member field of your Foo object…

So, the lack of an INCLUDE directive in Java is based on the calculation (probably quite valid) that, in just about any real-world usage case, you would be better off using inheritance and/or composition to achieve code reuse rather than some (compararively crude) INCLUDE mechanism.

Fine, but how does that apply to JavaCC? It doesn’t. For a very simple reason:

Legacy JavaCC has ZERO disposition for code reuse.

NOTHING! Rien… nada… nichts… Pretty much the only way to reuse code in legacy JavaCC is copy-paste. That’s all there ever has been and it looks like that is all there ever will be! I implemented INCLUDE back in 2008 because I just couldn’t fathom maintaining an entire Java grammar inside the JavaCC grammar that was not separately reusable!

If you cannot see how absurd this situation is… The day that they implement, let’s say, Lambda expressions (part of the Java language since JDK 8, released in 2014) they need, first of all, to implement it in two separate files, JJTree.jjt and JavaCC.jj, and even at that point, it is not separately usable anywhere else! As crude a disposition you might think that JavaCC 21’s INCLUDE is, it really does resolve this basic problem. The Java.javacc file that is INCLUDEd here is separately usable by any other project.

Maybe nobody is reusing it (probably not) but that’s just because they don’t know about it.

If I write an HTML grammar using legacy JavaCC, all the constructs for CSS, say, have to just be in the same file. With JavaCC21, I write a separately usable (again! key concept…) CSS.jj file and that is, in principle, separately usable by other projects. (Or it’s separably usable by ME! Or YOU! If one is going to be self-centered about things…)

So, I do have to say that your statement that this is not a “must have” is just wrong. It is an absolute “must have”. What you’re saying is based on the classic logical fallacy of confusing necessary with sufficient. Quite arguably, INCLUDE, as it is currently, is not entirely sufficient, okay, but it is necessary.

Granted, too, if there were (contrary to fact, at the moment) much more sophisticated mechanisms for grammar inheritance and/or composition, then quite possibly INCLUDE would no longer be necessary. Okay but that is a rather stretched kind of argument. Again, the only other existing version of JavaCC is the legacy tool, which again, has ZERO disposition for code reuse. NOTHING!

Now, as for composition of grammars, calling sub-parsers, well, yeah, okay. Legacy JavaCC does not have that either (quite obviously!) and neither does JavaCC 21. (JavaCC 21 could well have have this in the future. It is safe to say that the legacy project never will.) But composition of grammars, calling subparsers, that is not mutually exclusive with INCLUDE anyway.

Similarly, the new ATTEMPT/RECOVER I implemented is not mutually exclusive with the existing try-catch. I in fact, the existing try-catch still works. (For what it’s worth, which ain’t much…) It’s not even mutually exclusive with JAVACODE productions either. I could still put that back, but I doubt I will. (s I see it, you can’t just keep burdening yourself carrying things forward forever that almost nobody uses and are not very useful anyway.)

I mean, you’re also talking somewhat, as if things are mutually exclusive that are not mutually exclusive anyway. Having INCLUDE does not prevent us from later having more sophisticated code reuse or modularization. Also, it is possible maybe that INCLUDE could be refined and evolved towards being a better overall solution, since yes, it is a bit crude now.

But again, when you have a situation where the legacy tool has ZERO consideration for the whole problem… the conversation does get a bit frustrating, you know. I implemented INCLUDE in FreeCC 12 years ago or something and just about nobody ever got the benefit of even that.

I agree on your java language discussion, on the fact that includes can help, on the non mutually exclusive things.
I don’t mind keeping include. When things are here, if the code is clean, I tend to keep things (as for javacode, you see).

But you keep up referring to legacy JavaCC lacks, like an obssesion. Please stop, at least with me. Let’s build 21 not against 20, but for 21. You point to the important question: code reuse. Let’s build 21 for real code reuse.

My answer to “just about nobody ever got the benefit of even include” is: because most people want to build tools and then choose a parser tool, and write the grammar for the tool, but not for many future tools.
Not many people will share their grammars; open source project will de facto share their grammars but they have not written their grammars for any kind of tools, for reuse.

So we have a) to provide real code reuse features b) provide tutorials, examples…, not general diatribes c) to evangelise future users to think first of grammars and then of tools.Think of grammars like grammars allowing to easily build ASTs, integrate with others and with callers. Think of tools not through grammars but through visitors or injectors or functional programming or other patterns.

I’ve been thinking about this situation over the last few days and I have to tell you very bluntly: No, I’m not going to stop!

Absolutely not!

Now, to be clear, I’m not going to go on endlessly about that to the exclusion of everything else and I am going to focus quite a bit more on what I am doing than what these people have done (i.e. NOTHING).

But I will say with a certain level of frequency that that community, in the 17 years since Sun open-sourced the code, has, to all intents and purposes done NOTHING!!!

I fully intend to say it over and over again for a very simple reason: there is no real choice here.

Anybody in advertising will tell you that the way to get a message across is by repetition. You have your message and you hammer it over and over again. And yes, it has an obnoxious side, I agree, but these advertising campaigns are very repetitious because this is what works.

So there is simply no real choice in the matter. Another aspect that really must be hammered through people’s skulls is that this whole concept that somebody like Sreeni is the “owner” of JavaCC is completely illegitimate. Sun did not open-source the code back in 2003 in order for it to be Sreeni’s personal project. It was supposed to be a public good. That is clear. For him to be referring to himself as the “project’s owner” is completely illegitimate. He was supposed to be the custodian of a public good and it was supposed to be run for the public benefit.

But again, you see, Sreeni repeated this nonsense so many times, that he was the project’s owner, that people like you and this Francis André fellow (and others, I guess…) simply accepted this completely illegitimate idea. So, to dislodge that from people’s minds requires quite a bit of repetition. So, at any relevant juncture in which the topic comes up, I will repeat this point.

It is absolutely necessary to repeat certain things quite mercilessly – in fact, more or less the same way that if you were in a boxing ring and your opponent left himself wide open, you would just keep hitting the person. Repeatedly. Mercilessly. And really, if you don’t have the stomach for that, you really just have to find another pastime. Because, at this juncture, I’ve decided that I’m going to play to win and if I’m playing to win, I can’t have a teammate who is playing to lose, can I?

And, in general, this whole idea that you can be on my forum telling me to shut up… well, do what you want, I guess, but you have to understand that if you want me to stop telling the truth about certain things because it makes you uncomfortable, that it will not have any effect on my behavior. And, more generally, you just have to understand that there is no real choice here.

This is what must be done.