Tree Building Redux: Nailing another Dmitry Dmitriyevich problem

Originally published at:

Greetings, comrades! My name is Vladimir Vladimirovich Vladimirov!

Hey, whassup, Vlad!

As a follow-on to my blog post of a couple of days ago there were a couple of t's that needed crossing and an 'i' or two that needed a dot. Let's see...

First of all, I misspoke a little bit in that post. I said that the following two constructs were equivalent:

 Foobar #Foobar : ... ;


 Foobar #Foobar(true) : ... ;

That is not quite true. The first production above will put a new Foobar node on the stack as long as there is at least one new sub-node.

The second one really does create a new node unconditionally. In other words, if you have a production like:

 ZeroOrMoreFoos #ZeroOrMoreFoos(true) : ("foo")* ;

This will create a new ZeroOrMoreFoos node (that is empty, i.e. has no child nodes) even if it did not consume even one "foo". The definite node, like the first one above, i.e.

 ZeroOrMoreFoos #ZeroOrMoreFoos : ("foo")* ;

This will only create a new node only if at least one "foo" was consumed. Well, the two things would be equivalent if the expansion inside was ("foo")+ since we would always match at least one foo anyway, but in the general case they are not quite the same.

So, here is a run-down of that state of affairs:

 Foobar #Foobar : ... ;

This is effectively the same as:

 Foobar #Foobar(>0) : ... ;


 Foobar #Foobar(true) : blahblah... ;

is effectively the same as:

 Foobar #Foobar(>-1) : blahblah... ;

Building a node if you match non-negative number of sub-nodes is effectively the same thing as a true condition, i.e. you always create a new Foobar node. (It would be quite a trick to have a negative number of sub-nodes on the tree building stack!)

The default tree-building is such that (unless you override it with SMART_NODE_CREATION=false) if you don't write any tree-building annotation, i,e.

 Foobar : ... ;

This is effectively the same as:

 Foobar #Foobar(>1) : ... ;

Now that I've cleared that up, here is another follow-on to that blog post. I mostly inherited this whole notation for tree-building from legacy JJTree and had not revisited it at all hardly. Just trying to look at it the other day with newbie eyes made me realize what a horrible Dmitry Dmitryevich problem there was in the whole thing.

Like, if the node you are going to create has the exact same name as the production, why should you have to repeat it? So, problem->reaction->RESULT!!!

If the name of the Node is the same as the grammar rule, you no longer need need to repeat it. (Duh...)


  Foobar #Foobar : blahblah... ;

can now be written more tersely as:

  Foobar# : ... ;

The # after the name of the production just means that we create a node if there is at least one sub-node on the stack, which could be written not quite as tersely as:

  Foobar#(>0) : ... ;

Actually, the above is only part of the problem. Frequently you have to repeat the production/node name as a return value, so this leads to...

The Dmitry Dmitryevich Dmitryev problem

So, consider the following production:

  Foobar Foobar #Foobar : 
      "foo" (bar)*
          return CURRENT_NODE;

Here we repeat Foobar three times. The first time specifies the return type of the generated method. The second time is the method name, and the third time is the name of the Node subtype we create (which just happens to be the same as the method's return type.) Note that this is actually a very typical pattern. The above method simply returns the node that it just created. The alias for this is CURRENT_NODE. (Historical note. In legacy JJTree, CURRENT_NODE is called jjtThis. Actually, that still works, but I prefer CURRENT_NODE as the alias. I think the intent is clearer.)

Well, the astute reader will probably anticipate the dénouement of this story. The above production can now be written as:

  #Foobar# : 
     "foo" ("bar")*
      return CURRENT_NODE;

Yes, you guessed it. Now, if the return type is the same as the node type, you can just write # as a shorthand to indicate that.

Another little touch-up

Now, as for this business of having to add in this {return CURRENT_NODE;} at the end of a production if we want the generated method to return the current node, let's see...

First point. The main reason one could want this is that it allows us to express certain things very tersely elsewhere in the grammar. We can write:

  Foobaz :
    Foobar foobar = null;
         ... do something with the foobar object...

We can write foobar=Foobar because the Foobar method returns a Foobar that we can assign from.

Well, this is actually just syntactic sugar. If the Foobar production had no return type, we would have to write:

       foobar = (Foobar) peekNode();
       ... do something with the foobar object ...

Arguably, the whole thing is not worth the candle. But... as things stand, you can now write foobar=Foobar above even if the Foobar method has no return value. The way this is implemented is that if the method actually has no return value, the line:



try {
    foobar = (Foobar) peekNode();
} catch (ClassCastException cce) {
    foobar = null;

So, if the Foobar() method does not have a return value, we just try to grab the last value pushed onto the tree-building stack and use that. Of course, it has to be the right type, but if it's not, we just set it to null.

Final note

The above only works from within a grammar production, where JavaCC generates the above code. When you write your own Java code by hand and you want to be able to write:


then you do need the Foobar production to have the return CURRENT_NODE; at the end.

That's all for now, folks. I should point out that all the features described above are implemented and in the latest version that you can download here. (Well, in general, if I describe a feature it is implemented. If it is something that would be just nice to have, on some road map or wish list, then I'll explicitly say so!)