Context-Sensitive Tokenization, Next Installment, Activating and De-activating Tokens

Sometimes, when you complete a major code cleanup, features that were previously pie in the sky become low-hanging fruit to pluck. The new feature that I describe here, the ability to activate and deactivate tokens is such a case. It resulted from my rewriting of the lexical code generation that I describe here. In an …

Context-Sensitive Tokenization, Next Installment, Activating and De-activating Tokens Read More »

Major Milestone: The Lexical Code Generation is completely rewritten

Dear Readers… This blog has been dark for about two months now, but not because I was inactive in the project. Quite the opposite actually. What happened over the last couple of months is that I went into full-blown obsessive mode and managed to rewrite the remaining part of JavaCC that had been resisting my …

Major Milestone: The Lexical Code Generation is completely rewritten Read More »

The Dreaded “Code too large” Problem is a Thing of the Past

"It’s too big! It doesn’t fit!" The above does not refer to any particular pornographic feature film, but rather, to a longstanding problem in JavaCC: if you write a very big, complex lexical grammar, the generated XXXTokenManager would fail to compile, with the compiler reporting the error: "Code too large". Well, this has now been …

The Dreaded “Code too large” Problem is a Thing of the Past Read More »

A Glimpse of the Promised Land: Fault-tolerant parsing

For some time, it has been a goal of JavaCC 21 to provide the ability to generate fault-tolerant parsers. I started working on the problem about a year ago. However, I had not put in a comprehensive solution until now for several reasons. Basically these: The codebase, though already significantly refactored and cleaned up, was …

A Glimpse of the Promised Land: Fault-tolerant parsing Read More »

Context-Sensitive Tokenizing, Part Deux: Lexical States

(To get some prerequisite understanding of this topic, it might be a good idea to read this earlier blog post on context-sensitive tokenization from three months ago.) The Lay of the Land There are two quite useful ideas that have been in JavaCC from the very beginning: lookahead (particularly syntactic lookahead) lexical states Syntactic lookahead …

Context-Sensitive Tokenizing, Part Deux: Lexical States Read More »