OK, I guess I misunderstood. I thought that the continuation sequence (backslash + newline) only applied in the preprocessor. Also, I am surprised that the continuation character can actually split a token! That is rather funky. When would anybody ever want to do that?
Now, first of all, are we sure that this is the spec, or is this just the way it’s implemented somewhere or other? Particularly the thing where this splits a token is very strange. Somebody actually wrote a specification in which you can split a token this way? What is the use case for that?
Yes, to implement what you describe probably is impractical in the lexical grammar and actually, I think offhand that one would best handle it in the core pre-lexically, in roughly the same spot that one handles PRESERVE_NEW_LINES and TABS_TO_SPACES. I guess offhand it would be a question of patching the mungeContent routine here: javacc21/Lexer.java.ftl at master · javacc21/javacc21 · GitHub
Or possibly you have a separate routine that runs over the content and flags the backslash+newline as something to be ignored. But, in any case, if you want to implement the behavior you describe, then it’s not practical to do it in the lexical grammar probably, and I think you want to handle it pre-lexically.