Walkthrough: JSON¶

The earlier tutorials wrote small grammars from scratch. This one reads a complete, real grammar — the JSON grammar bundled with CongoCC — to see how the pieces covered so far fit together in practice. The grammar lives at examples/json/JSON.ccc in the source distribution and is an exact implementation of the JSON specification.

The grammar, part by part¶

It opens with settings:

PARSER_PACKAGE = "org.parsers.json";
NODE_PACKAGE   = "org.parsers.json.ast";
DEFAULT_LEXICAL_STATE = JSON;
TEST_PRODUCTION = Root;
TEST_EXTENSION  = json;

TEST_PRODUCTION and TEST_EXTENSION ask CongoCC to generate a small test harness that parses .json files with the Root production — see How To: Test Grammars and Parsers.

Whitespace is skipped, and the punctuation tokens are grouped under a single Delimiter node class with a # annotation:

SKIP : <WHITESPACE : (" "| "\t"| "\n"| "\r")+>;

TOKEN #Delimiter :
    <COLON : ':'> | <COMMA : ','>
  | <OPEN_BRACKET : '['> | <CLOSE_BRACKET : ']'>
  | <OPEN_BRACE : "{" > | <CLOSE_BRACE : "}">
;

The literals show two more techniques: per-alternative node classes, and private regular expressions used as building blocks. Each kind of literal gets its own node type (#BooleanLiteral, #NumberLiteral, …), and the <#…> patterns are private — reusable inside other patterns but not tokens themselves:

TOKEN #Literal :
    <TRUE: 'true'> #BooleanLiteral
    | <FALSE: "false"> #BooleanLiteral
    | <NULL: "null"> #NullLiteral
    | <#ESCAPE1 : '\\' (['\\', '"', '/',"b","f","n","r","t"])>
    | <#ESCAPE2 : "\\u" (["0"-"9", "a"-"f", "A"-"F"]) {4}>
    | <#REGULAR_CHAR : ~["\u0000"-"\u001F",'"',"\\"]>
    | <STRING_LITERAL : '"' (<REGULAR_CHAR>|<ESCAPE2>|<ESCAPE1>)* '"'> #StringLiteral
    | <#ZERO : "0"> | <#NON_ZERO : (['1'-'9'])(["0"-"9"])*>
    | <#FRACTION : "." (["0"-"9"])+>
    | <#EXPONENT : ["E","e"]["+","-"](["1"-"9"])+>
    | <NUMBER : ("-")?(<ZERO>|<NON_ZERO>)(<FRACTION>)?(<EXPONENT>)?> #NumberLiteral
;

The productions are short and mutually recursive — a Value may be an Array or a JSONObject, each of which contains Values:

Root : Value! <EOF>! ;
Value : <TRUE> | <FALSE> | <NULL> | <STRING_LITERAL> | <NUMBER> | Array | JSONObject ;
Array : <OPEN_BRACKET> [ Value (<COMMA> Value)*! ] <CLOSE_BRACKET> ;
KeyValuePair : <STRING_LITERAL> <COLON> Value;
JSONObject : <OPEN_BRACE>! [ KeyValuePair ("," KeyValuePair)*! ] <CLOSE_BRACE>! ;

Note

The ! markers are for Fault-Tolerant Parsing and are ignored unless the grammar is generated with FAULT_TOLERANT set, so they can be left in place for ordinary use.

The resulting tree¶

Parsing {"a": 1, "b": [true, null]} produces a tree in which every value has a precisely typed node — the per-alternative # annotations paying off:

<Root (1, 1)-(1, 28)>
  <JSONObject (1, 1)-(1, 27)>
    Delimiter: (1, 1) - (1, 1): {
    <KeyValuePair (1, 2)-(1, 7)>
      StringLiteral: (1, 2) - (1, 4): "a"
      Delimiter: (1, 5) - (1, 5): :
      NumberLiteral: (1, 7) - (1, 7): 1
    Delimiter: (1, 8) - (1, 8): ,
    <KeyValuePair (1, 10)-(1, 26)>
      StringLiteral: (1, 10) - (1, 12): "b"
      Delimiter: (1, 13) - (1, 13): :
      <Array (1, 15)-(1, 26)>
        Delimiter: (1, 15) - (1, 15): [
        BooleanLiteral: (1, 16) - (1, 19): true
        Delimiter: (1, 20) - (1, 20): ,
        NullLiteral: (1, 22) - (1, 25): null
        Delimiter: (1, 26) - (1, 26): ]
    Delimiter: (1, 27) - (1, 27): }
  Token: (2, 1) - (2, 1): EOF

Notice true is a BooleanLiteral and null a NullLiteral — the node types come straight from the per-alternative annotations in the Literal token production.

Next¶

Where to Go Next points to the larger bundled grammars to study next, and the How To: Shape and Use the Tree guide shows how to walk a tree like this one.

Walkthrough: JSON¶

The grammar, part by part¶

The resulting tree¶

Next¶

CongoCC

Navigation

Related Topics