Appendix: Glossary

abstract syntax tree
AST

The tree of nodes a parser produces, representing the structure of the parsed input. See Tree Building.

choice point

A place in a grammar where the parser must decide between alternatives — a | alternation, an optional [ ], or a loop ( )* / ( )+. See Disambiguation.

contextual predicate

A look-behind condition that selects an alternative based on which productions are currently on the parse stack, rather than on the upcoming tokens. See Disambiguation.

contextual token

A token, declared with the CONTEXTUAL kind, that the lexer produces only where the parser expects it. See Advanced Tokenization.

expansion

The right-hand side of a production — the pattern of tokens, non-terminals, and operators it matches. See Productions and Expansions.

fault-tolerant parsing

A mode in which the parser recovers from input that does not match the grammar and still produces a tree, rather than stopping at the first error. See Fault-Tolerant Parsing.

FIRST set

The set of tokens that can legally begin a given expansion; the parser uses it to choose alternatives by the next token. See Disambiguation.

grammar

The complete description of a language, written in a .ccc file, from which CongoCC generates a parser.

hook

A specially named method (such as TOKEN_HOOK) that, when injected into the parser, CongoCC wires into the generated code. See Code Injection.

injection

Adding your own code — methods, fields, supertypes — to the classes CongoCC generates, by means of the INJECT statement. See Code Injection.

lazy token

A token, marked with ?, that matches the shortest text satisfying its pattern instead of the longest. See Advanced Tokenization.

lexer

The component that turns input characters into a stream of tokens; also called the tokenizer or scanner. See Lexical Specification.

lexical state

A mode the lexer is in, determining which token types it can currently match. The starting state is DEFAULT. See Lexical Specification.

lookahead

Information about the upcoming input that the parser uses to choose between alternatives at a choice point. See Disambiguation.

node

An element of the syntax tree. Both productions and tokens can produce nodes. See Tree Building.

non-terminal

A reference to a production within an expansion.

parser

The component that consumes the token stream according to the grammar’s productions and builds the syntax tree.

private regular expression

A named pattern, declared with <#NAME : >, that can be referenced from other patterns but is not itself a token type. See Lexical Specification.

production

A named grammar rule. CongoCC generates one parser method per production. See Productions and Expansions.

recursive descent

The parsing technique CongoCC uses, in which each production is realized as a function that calls the functions for the productions it references.

smart node creation

The default behavior whereby a production builds a node only when it would have more than one child, suppressing trivial one-child wrappers. See Tree Building.

target language

The programming language CongoCC generates code in: Java, Python, C#, or Rust. See the Target Language Guide.

terminal

A token as it appears in an expansion — a string literal or a <NAME> reference.

token

An indivisible lexical unit produced by the lexer, such as a number, identifier, or punctuation mark. See Lexical Specification.

token type

The category of a token, declared in a token production and represented at run time by a value of the generated TokenType enumeration. See Generated API.

unparsed token

A token, declared with UNPARSED (or SPECIAL_TOKEN), that is kept but not passed to the parser — typically a comment. See Lexical Specification.

up-to-here marker

The =>|| notation that tells the parser to scan the input up to that point when deciding whether to take an alternative. See Disambiguation.