How To: Parse Resiliently

Warning

This feature is experimental. The syntax and behavior described here may change, and some capabilities are incomplete.

Tools that parse code as it is typed — editors, IDEs, linters — must cope with input that is constantly incomplete or wrong, and still produce something useful. This guide shows how to use CongoCC’s fault-tolerant parsing for that; the reference is Fault-Tolerant Parsing.

Turn it on

Set FAULT_TOLERANT in the grammar. This generates the recovery machinery and makes the recovery markers active (without it, they are ignored). A generated parser can also be switched between strict and tolerant at run time with setParserTolerant(boolean) — strict for a final, authoritative parse; tolerant for the live, in-progress one.

Mark the points where recovery makes sense

Put a ! on the tokens whose absence should not derail the parse — typically closing delimiters and statement terminators. A grammar that tolerates a missing closing parenthesis:

FAULT_TOLERANT;
Root : <LP> <ID> <RP>! <EOF> ;

Parsing the incomplete input ( a still yields a tree; the missing ) becomes an incomplete node rather than a thrown exception:

<Root (1, 1)-(1, 3)>
  LP: (1, 1) - (1, 1): (
  ID: (1, 3) - (1, 3): a
  RP: (1, 1) - (1, 1):  (incomplete)
  Token: (1, 1) - (1, 1): EOF

For larger recovery scopes, wrap a construct in ATTEMPT RECOVER or give a production a RECOVER_TO target so the parser resynchronizes at the next statement or declaration boundary.

Consume the result defensively

Code that walks a fault-tolerantly parsed tree must expect nodes flagged as incomplete and handle them gracefully — skipping them, or offering completions where they sit. The errors the parser recovered from are collected and available rather than thrown, so a tool can surface several problems at once instead of stopping at the first.

A good way to learn the feature is to generate one of the bundled grammars (they already carry ! markers) with -p FT or a FAULT_TOLERANT setting and feed it deliberately broken input.