Ability to declare that a token can match a more general token

This is a feature I would find very useful.

Right now, when you declare:
TOKEN: { < COUNT : “COUNT” > }
TOKEN: { < IDENTIFIER : (“a-zA-Z”)(“a-zA-Z0-9”)* > }
JavaCC will never match “count” as an < IDENTIFIER >.
So if your grammar allows “count” as an identifier, let’s say a table name and an alias name and a package name, besides it is a function name, you have to add extra definitions, like

When you have tens/hundreds of token definitions like this, the grammar’s readability suffers.
When you add a new token, in a newer version of the grammar, you need to check where to add it and add it, sometimes in more that one place, otherwise you get regression errors. Besides this, it often implies adding LOOKAHEADS.

It would be nice to be able to declare something like
TOKEN: { < IDENTIFIER : (“a-zA-Z”)(“a-zA-Z0-9”)* > }
and have the lexer handle the fact that “count” can match < IDENTIFIER >.

Well, there have traditionally been two main ways of dealing with this issue. The formal, “approved” way is surely by having different lexical states, so in your example, “COUNT” is a COUNT token in one lexical state, but in another lexical state, is just treated as an IDENTIFIER.

The other way of dealing with the issue, which is basically a kludge, is having CommonTokenAction method that does something or other, as needed.

(By the way, in JavaCC 21, you can still use CommonTokenAction, but there is also tokenHook(). The difference is that tokenHook has a return value, so you can instantiate a different kind of Token and return it, which is more flexible. But, in principle, tokenHook and CommonTokenAction are kind of the same idea.)

So, basically, to narrow in on the real issue here, the question would be: what problem do you have that is not well resolved by either (or a combination) of the above dispositions?

The other aspect of all this is that, as far as I know, the whole idea that a Token matches exactly one type is just the standard logic in all these tools, like Lex/Flex/JLex,JFlex… I don’t think it’s particularly specific to JavaCC.

But that, by the way, is also an aspect of the ATTEMPT/RECOVER. One could imagine a situation in which you ATTEMPT the expansion and then in your RECOVER block, your Java code could switch lexical state (or does some other trick) so that the expansion matches this time. Treating the COUNT token as an IDENTIFIER or vice versa might be the needed trick in your RECOVER block, if you see what I mean. And, again, you see why you would need to rewind to the start of the expansion for RECOVER (unlike the existing try-catch semantics…)

Here I am replying to my own comment. (Talking to myself is surely a sign of … I dunno…)

What I say above is actually mostly wrong. JavaCC (certainly legacy JavaCC, but even JavaCC21) never really had a workable solution for this kind of thing. In particular, CommonTokenAction just doesn’t really work in the general case, for reasons I outline here.

This is basically fixed in JavaCC21 now, at least insofar as the basic infrastructure is there to deal with these sorts of problems sensibly. (I do anticipate further refinements but the basic machinery is there to handle this, when it wasn’t before.)