Moving towards offering a "canonical" SQL grammar?

I’m starting this topic here in response to an issue (#46 to be precise) raised by Andreas Reichel.

What I think that conversation naturally leads to is the whole issue of JavaCC21 eventually (but perhaps sooner rather than later) including a “standard” SQL grammar as part of… well… let’s call it the basic package.

Now, a basic point about all of this is that I have come to the conclusion that this whole JavaCC space has been very stuck in a kind of 20th century mind-set. As I stated here:

The whole idea that all of these separate projects need to maintain their own separate Java grammar, which is probably about the same on the 98% level, it seems like something from a much earlier stage of computing. In the old days (I’m thinking late 20th century mostly) developers would frequently have their own implementation of Hashtable or a growable array (like java.util.ArrayList) that they use and maintain separately. Just part of their toolkit, like Clint Eastwood’s six-shooter, say. But nowadays there is the understanding that any modern language would have a standard class library and you just use that. So a Java developer just uses java.util.HashMap or java.util.ArrayList .

I would imagine that the above quote applies pretty much equally if you replace “Java” in the above with “SQL”. (Or with practically any other programming language!)

However, that said, I had been thinking that, when I turned my attention towards providing grammars for languages other than Java, I would first turn to programming languages, like Python or C#. SQL was not really on my radar – perhaps mostly because I personally have no need for an SQL parser and I’m not really a database guy. But then, I was thinking, if somebody needs this and is willing to do at least some of the work…

Now, one problem with regards to SQL, as opposed to Java, for example, is that there is the issue of all the vendor-specific variants and extensions that exist out there. But it does seem to me that just having a lowest common denominator SQL supported and then if people need some other vendor-specific language features, then they can just use the INCLUDE mechanism and just add the necessary parts. That is certainly far better than having to write an entire SQL grammar from scratch!

In fact, something that I have still not documented is that the current version of JavaCC allows you to just load a grammar from the jarfile. You can write:

  INCLUDE JAVA

in your grammar and it includes the Java.javacc grammar from inside the javacc.jar. In fact, that is how it works internally. See here for how this works.

So, obviously, to be able to write:

  INCLUDE SQL

and then just have all the more or less standard SQL constructs already defined in your grammar…

Well, that is the idea basically, and I just am starting this topic to allow some brainstorming. I mean, the basic questions to be resolved are things like:

  • What is the scale of this subproject? How much time and effort is this?
  • If we are going to start by supporting a lowest common denominator of SQL, what is that subset? (N.B. I really don’t know SQL very well at all!)

In closing, I would make the point that this is one thing that ANTLR has right. (Well, partially…) They have this repository of grammars as part of the project. So, certainly, there is the basic understanding that people should not have to re-invent the wheel constantly.

However, the problem with that is that if there is no commitment to keeping the grammar up to date, on anybody’s part, then the basic value proposition that this represents is not really fulfilled. I mean, for example, the latest Java grammar there supports JDK 9, which is nearly 4 years old. It certainly seems that a JavaCC sort of project has a vested interest in maintaining a totally up-to-date version of the Java grammar. But… SQL not so much. So obviously, there should be some commitment… on somebody’s part to “take ownership” of a certain piece like that, that they need in their professional work. (I assume Herr Reichel needs this SQL parser for what he does!) Or, in other words, the basic idea can’t really be that somebody just donates an SQL grammar (or whatever grammar) and then it just gradually gets stale, because nobody does the incremental work of keeping it up to date.

My sense of things at the moment (wooly-minded thoughts…) is that I myself am willing to extend myself quite a bit to get this going, but only if there is some clear understanding or expectation that somebody out there feels some commitment to doing the incremental work to keep the thing up to date afterwards. And presumably, this is not pure philanthropy. The person in question needs this in their own professional activities, so…

But I would also say that, once the initial hump of getting a grammar together is past, the incremental work of maintaining it is probably not so great. And this doesn’t just apply to SQL. Most programming languages do not evolve that quickly. Of all programming languages out there, Java has been evolving fairly quickly, but really, how many new language features (I mean, from a syntactic viewpoint) are typically added in a period of a couple of years? Of course, if you let the whole thing slide for a decade, then the amount of work to get back on track becomes quite overwhelming, so I guess the key thing is not to let oneself get too behind, right?

Well, I’ll close this opening post here then…