Fault-Tolerant Parsing¶
Warning
This feature is experimental. The syntax and behavior described here may change, and some capabilities are incomplete.
Ordinarily, a parser stops at the first input it cannot match. A fault-tolerant parser instead does its best to recover and keep going, so that even incomplete or incorrect input yields a usable tree. This is what editors, IDEs, and language tooling need: the code being parsed is constantly in a half-finished state, yet the tool must still produce an outline, offer completions, and report more than one error at a time.
Enabling fault tolerance¶
Fault tolerance is off unless you ask for it, because it changes the generated
code. Turn it on with the FAULT_TOLERANT setting:
FAULT_TOLERANT;
With it set, CongoCC generates the recovery machinery and honors the recovery
markers below. FAULT_TOLERANT_DEFAULT controls whether tolerant mode is
active at run time by default (it is), and a generated parser can be switched
between strict and tolerant parsing at run time with
setParserTolerant(boolean). When FAULT_TOLERANT is not set, the
recovery markers in a grammar are simply ignored, so they can be left in place.
Tolerant points: the ! marker¶
A ! after a token, a non-terminal, or a parenthesized group marks it as a
tolerant point: if the parser cannot match it, rather than failing it
inserts a placeholder and carries on. This grammar tolerates a missing closing
parenthesis:
FAULT_TOLERANT;
TOKEN : <ID : (["a"-"z"])+ > | <LP : "("> | <RP : ")"> ;
Root : <LP> <ID> <RP>! <EOF> ;
Parsing the incomplete input ( a still produces a tree; the missing ) is
represented by an incomplete node:
<Root (1, 1)-(1, 3)>
LP: (1, 1) - (1, 1): (
ID: (1, 3) - (1, 3): a
RP: (1, 1) - (1, 1): (incomplete)
Token: (1, 1) - (1, 1): EOF
A variant, !-> followed by a code block, runs that block to perform custom
recovery instead of simply inserting a placeholder.
ATTEMPT / RECOVER¶
For recovery that spans more than a single point, ATTEMPT wraps an
expansion and pairs it with a RECOVER clause that runs if the attempted
expansion fails:
ATTEMPT Expression RECOVER ( skipToSemicolon() )
The RECOVER clause is either a parenthesized expansion to parse instead or
an embedded code block to execute.
Production-level recovery with RECOVER_TO¶
A production may declare a RECOVER_TO expansion before its colon. If an
error occurs while the production is being parsed, the parser skips ahead to
that recovery expansion — a natural way to resynchronize at a statement or
declaration boundary:
Statement RECOVER_TO ";" : … ;
Incomplete nodes in the tree¶
Recovery leaves marks in the tree: nodes that were inserted or left unfinished
are flagged as incomplete (the (incomplete) annotation in the dump above).
Consumers of a fault-tolerantly parsed tree should expect such nodes and can
test for them through the node API (see Generated API). The list of
errors the parser recovered from is likewise available rather than thrown.
Status¶
Fault tolerance is usable but, as noted, experimental and not fully polished.
The bundled grammars carry ! markers (ignored unless FAULT_TOLERANT is
set) and are a good source of worked examples. Task-oriented guidance is in
How To: Parse Resiliently.