Productions and Expansions¶
A production is a grammar rule. Each production describes how to match one syntactic construct, and CongoCC turns each one into a method in the generated parser. The body of a production is an expansion: a pattern built from token references, calls to other productions, and the choice, grouping, and repetition operators. This chapter describes the syntax of productions and expansions; how the parser chooses between alternatives is the subject of Disambiguation.
A first production¶
In its most common form a production is just a name, a colon, an expansion, and a semicolon:
NumberList : <NUMBER> ( <COMMA> <NUMBER> )* <EOF> ;
This reads: a NumberList is a NUMBER, followed by zero or more occurrences of a COMMA and a NUMBER, followed by the end of input.
Expansion units¶
An expansion is a sequence of units. The units are:
- Terminals
A token to match, written either as a string literal (
"while") or as a token reference in angle brackets (<IDENTIFIER>,<EOF>). A string literal used here implicitly defines a token unlessREQUIRE_TOKEN_DECLARATIONis set; see Lexical Specification.- Non-terminals
The name of another production, optionally with arguments —
ExpressionorArgumentList(true). This calls the named production at that point in the parse.- Sequences
Writing units one after another matches them in order:
<IF> "(" Expression ")" Statement.- Choices
|separates alternatives, matching exactly one of them:Statement : IfStatement | WhileStatement | Block.
The grouping and repetition operators apply to a parenthesized expansion:
Form |
Meaning |
|---|---|
|
grouping |
|
optional — zero or one occurrence |
|
optional — an equivalent spelling of |
|
zero or more occurrences |
|
one or more occurrences |
For example, an optional else clause and an optional list of comma-separated
arguments:
IfStatement : <IF> "(" Expression ")" Statement [ <ELSE> Statement ] ;
ArgumentList : "(" [ Expression ( "," Expression )* ] ")" ;
Embedded code and actions¶
Braces enclose embedded code written in the target language. Two types of code embeddings are defined based on their position: a declaration prologue immediately after the colon, holding local declarations for the production, and actions interspersed in the expansion, which run when the parser reaches that point.
The following grammar parses a sum of integers and actually evaluates it. The
production has a return type (int), a prologue declaring two locals, and
actions that accumulate the total:
PARSER_PACKAGE = "prod.test";
SKIP : " " | "\t" | "\r" | "\n" ;
TOKEN : <NUMBER : (["0"-"9"])+ > ;
int Sum :
{
int total = 0;
Token n;
}
n = <NUMBER> { total += Integer.parseInt(n.toString()); }
( "+" n = <NUMBER> { total += Integer.parseInt(n.toString()); } )*
<EOF>
{ return total; }
;
The assignment n = <NUMBER> captures the matched token into the local
variable n; n.toString() is its matched text. Because Sum declares a
return type, the generated method returns a value, so a caller can write:
int result = new SumParser("1 + 2 + 3").Sum(); // result == 6
Note
Embedded code ties a grammar to one target language. A grammar that uses Java actions like the one above generates a working parser only for Java. To keep a grammar usable across all four target languages, avoid embedding target-language code in productions, or provide per-language variants; see the Target Language Guide.
Full declaration syntax¶
Beyond the common Name : expansion ; form, a production declaration may
include several optional parts, in this order:
[access] [ReturnType] Name [(parameters)] [throws …]
[ #NodeDescriptor ] [ RECOVER_TO expansion ] :
[ LexicalState : ] expansion ;
- access modifier
public,private, orprotectedon the generated method.- return type and parameters
A production may declare a return type and a formal parameter list, exactly as the above
Sumexample returnsint. Parameters let one production pass information to another:ArgumentList(boolean allowEmpty)is called asArgumentList(true).throwslistDeclares checked exceptions the production’s method may throw.
- node descriptor
A
#annotation controls the tree node the production builds — its name, or whether it builds a node at all. Node descriptors are covered in Tree Building.RECOVER_TOand!Markers used by fault-tolerant parsing, described in Fault-Tolerant Parsing.
- starting lexical state
Name : STATE : expansion ;makes the production begin matching in the named lexical state.
Lookahead and assertions in expansions¶
Two further types of descriptors may appear in an expansion, but they are discussed elsewhere because they govern how the parser matches rather than what it matches:
SCAN, the up-to-here markers (=>||), and=>control lookahead at choice points — see Disambiguation.ASSERTandENSUREassert conditions during parsing or lookahead, andFAILforces an error — also in Disambiguation.