A Fun Example of Code Injection at Work

Originally published at: https://javacc.com/2020/03/14/a-fun-example-of-code-injection-at-work/

Programmatically generating code is a little world in itself, with its own little quirks and tricks. Consider the generation of the C-style switch-case with fall-through. You have your code generation that generates all your cases, and each case ends with a break. There is a problem, which is that the moment any of your case statements actually end by returning a value, you have a problem. For example:

switch (foo) {
   case bar:
      blah blah 
      return foobar;
      break;
   case baz:
      blah blah
      break;
}

Javac refuses to compile the above code because the break statement after a return is unreachable code. Well, true enough. The classic solution of JavaCC to this problem was to replace the above with:

 
   if  (true) return foobar;
   break;

Voilà! You insert if (true) in front of the return and compiles! It’s a funny little trick because the break statement is just as unreachable as before, except the compiler doesn’t realize it! I find such a solution to be in rather bad taste because it is based on tricking the compiler, no? But the other thing is that it just generates ugly code. Recently, I have been spending a lot of time eyeballing the code that JavaCC generates, and I just got so fed up with seeing this. So, finally, I decided to scratch the itch and implement a solution. This is where code injection comes into the story. Have a look here.

What I do is that I simply use code injection to insert a little method to check for this case when the CaseStatement node is closed. (Note that the close() method is a little hook that is invoked by the tree building machinery when a Node is finalized. The code looks like:

INJECT(CaseStatement) :
{}
{
    // If the case statement has an unreachable break statement at the
    // end, we remove it. This allows us to get rid of this horrid longstanding kludge
    public void close() {
        if (getChildCount() >= 2) {
            Node last = this.getLastChild();
            Node secondLast = this.getChild(getChildCount() -2);
            if ((last instanceof BreakStatement) && (secondLast instanceof ReturnStatement)) {
                removeChild(last);
            }
        }
    }
}

So now the code generated finally is:

switch (foo) {
   case bar:
      blah blah 
      return foobar;
   case baz:
      blah blah
      break;
}

where before it was:

switch (foo) {
   case bar:
      blah blah 
      if (true) return foobar;
      break;
   case baz:
      blah blah
      break;
}

Granted, some people might say that this did not solve any real problem, since after all, functionally, both of the above snippets are the same. That is true, I guess. Even if the second snippet is ugly, it functions just as well.

Fine. However, to that I would respond that a beautiful woman and an ugly one are also functionally the same. So, as far as I’m concerned, if you are happy with ugly code or with ugly women, you are welcome to them!

Well, politically incorrect joking aside, the above provides an example of why the re-activated JavaCC21 project will inevitably leave the legacy tool in the dust — in terms of features and general usability. It has been refactored to such an extent that it is quite easy to resolve problems that are borderline intractable using the legacy tool, or at the very least, require very annoying, ugly kludges.

Actually, I was looking at some of the older posts from some months back. I just realized that the above has no relevance any more, because, since then, I got rid of all the switch-case code generation in the generated parser. (There is still some in the lexer generation, which is much less cleaned up.) Now it’s all if-elseif-else, which I find much more manageable. There is some theory that switch-case generates faster code, more optimized bytecode, and that may be true (not that I ever tried to verify this!) but I don’t think there is any noticeable difference in practice. What is important for me is that if-else is just easier to read and understand.

Another thing I got rid of was that whole scheme of throwing an exception to jump out of a Lookahead routine. It just uses normal control flow. When it has scanned ahead the maximum lookahead amount, it just the various lookahead subroutines, just all return true.