Generated API¶
This chapter describes the contract the generated code presents to your application — the parser, the lexer, tokens, nodes, and the parse exception. The contract is the same across all four target languages; what differs is naming and idiom. The names and signatures here follow the Java target; for the per-language equivalents see the Target Language Guide.
Classes and packages¶
For a grammar with base name Foo, CongoCC generates a FooParser and a
FooLexer in the PARSER_PACKAGE, and the node classes in the
NODE_PACKAGE (by default <PARSER_PACKAGE>.ast). All of these names are
configurable; see Settings Reference.
The parser¶
Construction. The parser offers constructors for the common input sources —
an in-memory CharSequence or a file Path — each optionally taking a
name for the input source (used in error messages), and one taking a
pre-built lexer:
new FooParser(CharSequence content)
new FooParser(String inputSource, CharSequence content)
new FooParser(Path path) // throws IOException
new FooParser(String inputSource, Path path)
new FooParser(FooLexer lexer)
Parsing. Each production becomes a method of the same name. Call the one that is your start symbol to parse; a production declared with a return type returns its value:
parser.Document(); // parse, building the tree
int n = parser.Sum(); // a production with a return type
The result. After parsing, rootNode() returns the root Node of the syntax tree.
Run-time controls. A handful of methods adjust behavior at run time:
setBuildTree(boolean)/getBuildTree()andisTreeBuildingEnabled()— turn tree building on or off for a parse.setTokensAreNodes(boolean)andsetUnparsedTokensAreNodes(boolean)— the run-time counterparts of the tree settings.setParserTolerant(boolean)/isParserTolerant()— switch fault-tolerant parsing on or off.cancel()/isCancelled()— cooperatively cancel a long-running parse.getNextToken()andgetToken(int index)— direct access to the token stream, mainly for use inside grammar actions.
The lexer¶
FooLexer produces the token stream from the input. It is usually driven by
the parser and not used directly, but it can be constructed on its own and
passed to the parser’s lexer constructor when you need to tokenize without
parsing.
Tokens¶
Every token is an instance of the token class (Token by default, settable
with BASE_TOKEN_CLASS). A token is also a Node, so it carries position information and
fits in the tree. Its members include:
getType()— the token’sTokenType.getSource()/getImage()— the matched text.getBeginLine(),getBeginColumn(),getEndLine(),getEndColumn()and the offset formsgetBeginOffset()/getEndOffset()— the token’s position in the input.getNext()/getPrevious()— the adjacent tokens in the stream.isUnparsed()— whether this is an unparsed (special) token such as a comment.
TokenType is a generated enumeration with one value per declared token
type, plus EOF. Using an enum rather than integer constants means token
comparisons are type-checked at compile time.
Nodes¶
Every syntax-tree node implements the Node interface. Its traversal and
position members are the working surface for consuming a tree; the most used
are summarized in Tree Building (children(), descendants(),
firstChildOfType(...), getType(), getParent(), getSource(),
dump()). Three nested types complete the model:
Node.NodeType— the common supertype ofTokenTypeand the node-type enumeration, so a node’sgetType()covers both tokens and productions.Node.Visitor— the reflective visitor base class (see Tree Building).Node.CodeLang— the enumeration of target languages (JAVA,PYTHON,CSHARP,RUST).
The parse exception¶
When the input does not match the grammar, the parser throws a
ParseException. By default it is an unchecked exception (it extends the
language’s runtime-exception type), so callers are not forced to declare or
catch it; set USE_CHECKED_EXCEPTION to make it checked instead. It carries:
getMessage()— a human-readable description, including the position and what was expected; the stack trace includes locations in the grammar, not just the generated code.getLocation()/getToken()— the node/token where parsing failed.hitEOF()— whether the failure was an unexpected end of input.
When fault-tolerant parsing is enabled, errors are recovered from and recorded rather than thrown, and the returned tree may contain nodes flagged as incomplete.