Rust¶
CongoCC generates complete, self-contained Rust parser crates. Rust is a first-class target: the generated crate includes a lexer, a recursive-descent parser, an arena-based AST, a visitor trait, an AST mapper, and a pretty-printer, with no external dependencies.
Enabling Rust generation¶
Rust support is gated behind a build flag, off by default, so the congocc.jar
you use must have been built with it enabled:
$ ant build -Drust.enabled=true
Then generate as usual, giving -d the crate directory:
$ java -jar congocc.jar -n -lang rust -d my-parser MyGrammar.ccc
$ cd my-parser && cargo test
Output layout¶
Unlike the other targets, the crate files are written flat into the -d
directory:
Cargo.toml crate metadata (no dependencies required)
lib.rs crate root: module declarations and re-exports
tokens.rs token type enum and token struct
lexer.rs NFA-based lexer with multi-state support
parser.rs recursive-descent parser with scan/lookahead
ast.rs arena AST: NodeId, Ast, AstBuilder
error.rs ParseError with location and expected-set info
visitor.rs Visitor trait and AstMapper
pretty.rs Wadler-Lindig pretty-printer
tests/parse_files.rs integration harness for a test-data directory
inject.rs, FIXME.md injected-code translation and its status
Using the parser¶
A single entry point parses the input and returns an owned AST; nodes are
navigated through NodeId handles rather than references:
use my_parser::parser::Parser;
let ast = Parser::parse(input, Some("filename.ext"))?;
let root = ast.root().expect("AST has no root");
println!("{:?} = {}", ast.kind(root), ast.text(root));
The AST is arena-based — a flat Vec of nodes addressed by
NodeId(u32) — which avoids recursive allocations and lets you navigate with
ast.parent(id), ast.children(id), and ast.next_sibling(id) without
lifetime juggling. ast.dump(id) prints a subtree for debugging.
Traversing and transforming¶
The generated crate offers three levels of AST processing:
Read-only: implement the
Visitortrait (overridevisit_*methods, callvisit_childrento recurse), useast.dump(), or render with thePrettyPrinter(default width 120 columns).Value rewriting: implement
AstMapper, a bottom-up functional transform that returns a new AST with nodes replaced, spliced, or removed.Structural editing: mutate
&mut Astin place with the synthetic-token API —new_synthetic_token,new_node,append_child,insert_after,detach— thenunparse()to regenerate text orreparse()to validate the edited tree by running it back through the parser.
See README_RUST for more information on these techniques and pointers to example code.
Injected code¶
Because the arena AST has no per-type node structs, INJECT blocks cannot be
inserted into node classes the way they are for the other targets. CongoCC
translates them as far as it can into inject.rs and records what needs
hand-finishing in FIXME.md; see Injected Code Across Languages.
CongoCC ships with a number of Rust example grammars. Pure-syntax grammars (JSON, Lua, CICS, SqlExpr, the arithmetic examples) generate fully working crates with no manual step; the large injection-heavy grammars (Java, C#, Python) generate but need their injected logic ported to Rust by hand. See README_RUST.
Optional serialization (serde)¶
Generated crates have no dependencies, but the Cargo.toml is set up so you
can opt into serde serialization of the AST. Add the
serde dependency and feature as described in the project’s Rust README, then
build with it on:
$ cargo build --features serde
$ cargo build --release --features serde
Runtime¶
Generated crates require Rust 1.89 or later (2024 edition) and, by default,
nothing else. Build and test with cargo; the bundled examples also provide
ant rust-gen and ant test-rust targets that regenerate the crates and run
their cargo test suites.