How To: Shape and Use the Tree¶
CongoCC builds a syntax tree for you automatically. This guide is about the next step: turning that default tree into something convenient to consume, and getting your data out of it. For the precise meaning of each construct used here, see the Tree Building reference.
The running example is a tiny configuration language:
PARSER_PACKAGE = "tree.test";
SKIP : " " | "\t" | "\r" | "\n" ;
TOKEN : <ID : (["a"-"z"])+ > | <NUM : (["0"-"9"])+ > ;
Config : ( Pair )+ <EOF> ;
Pair : <ID> "=" Value ;
Value : <NUM> ;
Start by looking at the tree¶
Before shaping anything, generate the parser and dump a tree so you can see
what you actually get. Node.dump() prints the subtree:
TreeParser parser = new TreeParser("a=1 b=22");
parser.Config();
parser.rootNode().dump();
Working from the real output — rather than from what you imagine the tree looks like — is the single most useful habit when shaping a grammar’s tree.
Decide what is noise, and remove it¶
Two defaults already keep the tree tidy: tokens that wrap a single child do not
get their own node (smart node creation), and you can drop any pass-through
production from the tree with #void. If Pair were a thin wrapper you did
not care about, Pair #void : … would lift its children up into Config.
Conversely, if you find a useful production has been optimized away by smart
node creation and you want it back, give it an explicit name (next section) or
turn SMART_NODE_CREATION off for the whole grammar.
Tip
You rarely need to remove token nodes such as "=" from the tree — it is
usually easier to simply ignore them when you traverse, using the typed
accessors below, than to reshape the grammar around them.
Name nodes for the consumer¶
Give nodes the names the consumer of the tree wants to see, not necessarily
the names that were convenient in the grammar. Renaming Pair to
KeyValue makes the downstream code read well:
Pair #KeyValue : <ID> "=" Value ;
Naming a family of tokens with a shared class — TOKEN #Keyword : … — is the
token-level equivalent; see Lexical Specification.
Pull data out with the Node API¶
Most consumers do not need a visitor at all — the typed accessors on Node
are enough. To collect every key/value pair, find the KeyValue nodes and,
within each, the ID and NUM children:
import tree.test.ast.*;
TreeParser p = new TreeParser("a=1 b=22");
p.Config();
for (KeyValue kv : p.rootNode().descendantsOfType(KeyValue.class)) {
Node key = kv.firstChildOfType(ID.class);
Node val = kv.firstChildOfType(NUM.class);
System.out.println(key.getSource() + " => " + val.getSource());
}
This prints:
a => 1
b => 22
descendantsOfType and firstChildOfType accept either a node class (as
here) or a NodeType value, and getSource() returns the matched text.
The full set of accessors is in Generated API.
Use a visitor for type-dispatched work¶
When processing varies by node type — an interpreter or a code generator, say —
a visitor is cleaner than a cascade of type checks. Extend Node.Visitor,
write one visit method per node type, and call recurse to descend:
class Printer extends Node.Visitor {
public void visit(KeyValue kv) {
System.out.println("pair -> " + kv.getSource());
recurse(kv);
}
}
new Printer().visit(parser.rootNode());
Because dispatch follows the class hierarchy, a visit method for a base
class or an #interface node type handles all of its subtypes at once — a
good reason to give related productions a common node supertype.
Put behavior on the nodes themselves¶
For anything beyond reading, it is often cleanest to add methods or fields
directly to the generated node classes rather than computing over them from
outside. The INJECT statement does this without your having to edit
generated code — for example, giving KeyValue a getKey() method. See
Code Injection.
Where to go next¶
Tree Building — the full reference for
#descriptors, smart node creation, and tree settings.Code Injection — adding behavior to node classes.
Generated API — the complete
NodeAPI.