Code Injection¶
The classes CongoCC generates — the parser, the lexer, the node types — are
often most useful with some of your own code added to them: a helper method on
a node, a field on the parser, an extra interface. The INJECT statement adds
that code from the grammar, so it lands in the generated files without your
ever editing them by hand and survives every regeneration.
A first injection¶
This grammar gives its KeyValue node class a getKey() method:
Config : ( Pair )+ <EOF> ;
Pair #KeyValue : <ID> "=" <NUM> ;
INJECT KeyValue :
{
public String getKey() {
return firstChildOfType(ID.class).getSource();
}
}
After generation, KeyValue has the method, so consuming code can call it
directly:
for (KeyValue kv : parser.rootNode().descendantsOfType(KeyValue.class))
System.out.println("key=" + kv.getKey()); // key=x, key=y
Anatomy of an injection¶
The full form names a target and may supply imports, a superclass, interfaces, and a body of members:
INJECT TargetName :
import java.util.List;
extends SomeBaseClass
implements SomeInterface
{
private List<Foo> foos;
public List<Foo> getFoos() { return foos; }
}
Every part is optional. A short injection that only sets a superclass needs no body at all:
INJECT MyNode : extends AbstractBaseNode
Injection targets¶
The target name is the class to inject into. It may be:
a node type — a production’s node name (
KeyValueabove), to add behavior to one kind of tree node;the parser or lexer, referred to by the magic names
PARSER_CLASSandLEXER_CLASS, which resolve to whatever those classes are actually named;the base node or token class,
BASE_NODE_CLASSandBASE_TOKEN_CLASS, to add behavior shared by every node or token;the
Nodeinterface, to add default methods to all nodes.
Because #abstract and #interface node descriptors (see
Tree Building) let you introduce shared supertypes in the tree,
injection into those supertypes is the idiomatic way to give a family of nodes
common behavior.
Hooks¶
A hook is a method with a special name that, when you inject it into the parser, CongoCC wires into the generated code at the right place. Hooks are how you run your own logic during lexing and tree building without a configuration setting. The recognized hooks are:
TOKEN_HOOKCalled for each token as it is produced, receiving the token and returning a token — possibly a different or modified one. This is the mechanism behind context-sensitive tokenization and synthetic tokens (see Advanced Tokenization). Its signature takes and returns the base token type:
INJECT PARSER_CLASS : { BASE_TOKEN_CLASS TOKEN_HOOK(BASE_TOKEN_CLASS t) { // inspect or transform t, then ... return t; } }
OPEN_NODE_HOOK/CLOSE_NODE_HOOKCalled as each tree node is opened and closed, for code that needs to run on entry to and exit from a production’s node scope.
RESET_TOKEN_HOOKCalled when token processing is reset.
Defining a method with one of these names is all that is required; there is no
separate setting to enable it. (These hooks subsume the legacy
COMMON_TOKEN_ACTION and NODE_SCOPE_HOOK options; see
Appendix: Legacy Mapping.)
Injected code and target languages¶
Injected code is written in the target language, so an injection ties the grammar to that language. To keep a grammar usable for several targets, guard language-specific injections with the preprocessor —
#if __java__
INJECT PARSER_CLASS : { /* Java-specific members */ }
#endif
— or keep injected code out of the grammar entirely. The Target Language Guide covers how injected code differs across Java, Python, C#, and Rust, including the generated stub files that mark where handwritten Rust belongs.