How Parsing Works ================= This chapter explains the parsing model behind CongoCC, to help you reason about why a grammar behaves as it does. It is background, not a how-to. Recursive descent ----------------- CongoCC generates a **recursive-descent** parser. Each production becomes a method, and a production that references another production compiles to a method that calls the other's method. Parsing the grammar is thus, quite literally, running a set of mutually recursive functions whose call structure mirrors the grammar's structure. This is why the generated code is readable, why a stack trace points at recognizable productions, and why embedding target-language actions in a production is natural — they are just statements in a method. The cost of this model is that the grammar must not be left-recursive: a production cannot call itself as its very first action, or the parser would recurse forever. Left recursion is expressed instead with the repetition operators, which is exactly what the :doc:`calculator <../tutorial/calculator>` grammar does to make its operators left-associative. Deciding at choice points ------------------------- At each :term:`choice point` the parser must pick an alternative. By default it decides on the next single token: if that token can begin an alternative, the parser commits to it. This is fast and, for most of a grammar, sufficient. When one token is not enough, the parser does **scan-ahead**: it tentatively scans the upcoming tokens far enough to decide, then rolls back and parses for real. The ``=>||`` marker and the ``SCAN`` statement (:doc:`../../reference/disambiguation`) are how you tell it where and how far to scan. A long-standing limitation of the original JavaCC — that nested scan-ahead did not work — has been fixed, so a scan may itself enter productions that scan. What "deterministic" means here ------------------------------- The generated parser is deterministic: at every point it makes one choice and follows it, using lookahead to choose well rather than trying alternatives in parallel. Scan-ahead is bounded backtracking *for the decision only* — the parser looks, decides, and then parses once. Understanding that the decision and the parse are separate steps is the key to reasoning about lookahead.