How To: Test Grammars and Parsers¶
A grammar is software and benefits from tests: a corpus that must keep parsing, and checks that bad input is rejected. This guide covers the built-in test harness and writing your own tests.
The built-in corpus harness¶
Set TEST_PRODUCTION to a start production and TEST_EXTENSION to a file
extension, and CongoCC generates a ready-to-run harness that parses every
matching file under the paths you give it:
TEST_PRODUCTION = Root;
TEST_EXTENSION = json;
For the bundled JSON grammar this generates a ParseFiles class. Point it at
a file or a directory tree (it even descends into .zip and .jar
archives) and it reports what parsed and what did not:
The Java impl parsed sample.json.
Parsed 1 files successfully
Failed on 0 files
Duration: 13 milliseconds
This is the quickest way to run a grammar against a large corpus and catch regressions: keep a directory of known-good inputs and fail the build if any stops parsing.
Writing your own tests¶
For finer-grained tests, drive the parser directly from a unit test. A positive test parses an input and asserts something about the resulting tree using the node API (Generated API):
var parser = new CalcParser("2 + 3 * 4");
assertEquals(14.0, parser.Calc());
A negative test asserts that malformed input is rejected — by default the
parser throws an (unchecked) ParseException:
assertThrows(ParseException.class, () -> new CalcParser("2 +").Calc());
What to test¶
A corpus of real-world inputs that must keep parsing — the harness above.
Boundary cases: empty input, the largest constructs you support, deep nesting.
Negative cases: inputs that must fail, so a grammar change does not silently start accepting nonsense.
Tree shape, where it matters to consumers, so a refactor of the grammar does not quietly change the tree your application depends on.