New Experimental Feature: ATTEMPT/RECOVER

Originally published at: https://javacc.com/2020/05/03/new-experimental-feature-attempt-recover/

A bit over a month ago, I wrote about my intention of tearing out support for “JAVACODE” productions. A JAVACODE production is really just a Java method that is like a pretend grammatical production. I could find very few JavaCC grammars in the wild that used this.

I asked whether anybody would miss them and nobody answered. This could be because nobody cares or nobody is paying any attention anyway. Regardless, they are gone. RIP.

Another related legacy JavaCC feature (perhaps using the term “feature” loosely) is try-catch. I don’t mean the try-catch that is part of the Java language. I mean the JavaCC try-catch where you put a grammatical expansion inside the try block. Like so:

try {
  Foo() Bar() Baz()
}
catch (ParseException e) {
   // Some arbitrary Java code
}

So you have a grammar expansion inside the try block and then you have a catch (one or more catch blocks and maybe a finally block, just like Java) in which you put your Java code to handle the error. Except…. hold on…

What do you do inside the recovery block?

Beats me. Just as I pointed out in my post about getting rid of JAVACODE productions, JavaCC provides no real disposition for error recovery. With this sort of try-catch, the general situation is that the exception bubbled up from deep in the bowels of our grammar, I mean some deeply nested sub-expansion, right? So what are we supposed to do in the catch block? Or to frame the question more precisely: What do we do in the catch/finally section that is more useful than just letting the exception bubble up to whatever default handler?

Well, I think the cold hard truth of the matter is that there really is not much to do be done in this spot. Ergo, this “feature” is simply not very useful. And that could explain why nobody uses it! I scoured the web trying to find real-world usage examples of this this try-catch and came up with nothing. Really nothing, even less than JAVACODE productions. I can’t find a single JavaCC grammar out there that does this.

Of course, the feature being about as useful (as a nun’s… fill-in-the-blank) is only one explanation for nobody using it. Another possible explanation for nobody using the feature is that people don’t even know about it! I tried to think back about whether, in my FreeMarker development days, I even knew that you could put this kind of try/catch in a grammar production. I honestly can’t remember whether I even knew that the feature existed. (Another damned senior moment, eh?) I likely knew at some point, but I wouldn’t be surprised if I was a heavy user of JavaCC for years before happening on this. After all, the main way that people learn JavaCC is by studying and adapting existing grammars, and if the existing grammars simply never use this…

Well, anyway, the existing try/catch really is not useful for a very simple reason: it doesn’t rewind to the state of the parse before the attempted expansion. So I introduced an alternative construct that does do that. And instead of try/catch, it is ATTEMPT/RECOVER. The syntax (and if I get feedback, it could still change) looks like this!

ATTEMPT(Foo() Bar())
RECOVER 
{
   // optional java code block
}
(
    FooBar() Schmoobar()
)

So, you ATTEMPT to parse some expansion and then after RECOVER, you can have two blocks, one being a block of Java code or another Expansion to fall back on. Now, actually, as things stand, you can have both the java code block and the recovery expansion or just just one of the two. (Though I guess you could effectively have neither by simply putting in an empty java code block {} and not having any recovery expansion.

In any case, the idea is that your java code tweaks something or other so that you can recover. Maybe it skips past some invalid goo or it changes to another lexical state before resuming the parse.

Regardless, the key thing to take away from this is that when you hit RECOVER, the state of your world is restored to what it was right before the ATTEMPTed expansion. That includes the state of the tree building machinery and your lexical state and such.

ATTEMPT/RECOVER semantics?

Now, this is an experimental new feature and I am quite interested in getting feedback about how it should work. For example, I am grappling with the question of how syntactic lookaheads should deal with an ATTEMPT/RECOVER block. The current state of things is that if you write:

void Foobar() : 
{}
{ 
   ATTEMPT(Foo()Bar())RECOVER(Baz())
}

In the above case, a syntactic LOOKAHEAD, like LOOKAHEAD(Foobar()) will create a lookahead routine that scans forward for the ATTEMPTed expansion Foo() Bar() but does not check for Baz().

The idea is that the Foobar() production really only completes normally via Foo() Bar(), not the recovery expansion Baz(), which we are using to fallback to if we can’t do Foo()Bar().

It could be argued that LOOKAHEAD(Foobar()) should check forward for Foo()Bar() OR Baz() since both of them would end up matching the production. I’m really not sure and would be quite happy to discuss with people how this should work. This would be anybody’s chance to have some input at an early stage to how the next generation of this tool will work.

So I’ll just close by saying that this new feature is currently experimental and subject to change and we are very interested in feedback. I would also add that the feature, though already far more useful than the existing try-catch (that is still available, by the way) it will become more useful over the coming weeks and months, as more error-recovery machinery is introduced, so that there is a clearer answer to what one can do in the recovery block!

Hi
I believe we need to be more precise on the “state” left when encountering an exception: in the TCF (try / catch / finally): what is exactly the current behavior if we have try (p1() p2() … pn()) and exception arises in p1, p2 or pn?
And same, what “state” you intend to recover in attempt (p1() p2() … pn()) recover () for all cases ?
And does this depend on the exception type (parser / lexical)?
On my side, I have TCF in real world grammar that reposition the token manager to the end of line and tries to resume parsing.
I can imagine using attempt / recover syntax for cases like this:
attempt ( “(” p() “)” ) recover ( attempt ( “(” p() – missing RPar – ) recover ( attempt ( p() “)” – missing LPar – ) recover ( giveup) ) ) for IDEs to handle simple cases instead of using ("(")* with the extra lookaheads needed.

The idea of ATTEMPT/RECOVER is that if the ATTEMPT part fails, the parser/lexer machinery is rewound to the state it was in before the ATTEMPT. I believe this is working now, but it is hardly tested at all.

The basic idea is that, in the general case, the RECOVER part has two components, a Java code block and then a grammatical expansion. So, presumably, the Java code block is where you have the opportunity to make some adjustments so that the parse can succeed. One could even imagine code like this:

ATTEMPT(Foo())
RECOVER {…some Java code…} (Foo())

Of course, if the Java code block does nothing, then the whole thing is for nothing, because it attempts Foo() and then fails, the parsing machinery rewinds and it attempts Foo again, well, it will fail again! Guaranteed!

So some adjustment has to happen in the java code block so that when we try to parse Foo() the second time, it will succeed. (Or at least have some chance of succeeding as opposed to definitely failing.)

What adjustment? Well, you could move forward one character in the input stream, or skip forward one token or scan forward for a token of a certain type.

Or you could change lexical state maybe…

So, you are right that it is not really formalized what you can do in the java code part of the RECOVER and this is what needs to get clarified. You have to understand that this is still very much a work in progress!

But you see that this is already laying some basis for attacking the problem. With the existing try-catch machinery, when you enter the catch block, it is very unclear what you can really do. It’s not even clear where you are. With ATTEMPT/RECOVER, the parsing/lexing machinery rewinds to the starting point, so you have some clarity about where you are at least and what can be done.

But you know, the thing about this, the try-catch and the JAVACODE productions as well, it is very hard to find many examples f of usage out there. And this is precisely because these things are not really very useful!

I was suggesting first studying what legacy javacc exacly does before making a decision for a new feature: may be with just a few lines of code you can make it reposition exactly at the “beginning” of the faulty production (token pointers, built nodes, …).

Well, I’m not sure I would want the ATTEMPT reposition at its previous state; may be experience would show it’s more comfortable to reposition a the state before the faulty production, in cases like ATTEMPT ( P1() P2() P3()) and P2() fails.

I’m puzzled by this statement. You seriously think that I didn’t look carefully at what the legacy JavaCC tool does?

All that legacy JavaCC does with a try-catch is translate it into a Java try-catch.

But the thing with this is that, when I say that the legacy try-catch is not something very useful, it’s not just me saying it. There is over twenty years of praxis that demonstrates pretty clearly that it is not very useful. Just try to figure out how many grammars out there in the wild make use of this! Very few. Precisely because it is just not very useful. Again, this is not my judgment. It is the judgment of over 20 years of praxis!

Well, if you have ATTEMPT(P1() P2() P3()) and it fails in P2, it is going to rewind to right before P1().

If you wanted it to rewind to its state right before P2, you would need to write:

ATTEMPT(P2() etc.)

If it fails and rewinds, it rewinds to the state it was in before the attempt. I’m not sure what else is really possible in general.

The other issue is that if you write:

try {P1() P2() P3()} catch{…}

in principle, the exception could come from some deeply nested sub-production. Again, it is just not very clear, in the general case, what useful thing you can do in the catch block with the legacy try-catch semantics. And again, that is almost certainly why the whole thing is so little used out there, precisely because it is not useful!

Admittedly, it remains to be seen how generally useful the new ATTEMPT/RECOVER construct is. But probably, it could not be less useful than the legacy try-catch, which is pretty much completely useless!