Python code generation - performance

Initial testing of parsers generated in Python indicate that the performance, while usually slower than the parsers generated in Java, is not too bad if the Python implementation uses a tracing JIT, as Java does. Using pypy3, I got these numbers from a recent test run:

----------------------------------------------------------------------
Testing with JSON grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-iv6n33sj
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Java lexer run completed (2.74 secs).
Java parser run completed (4.65 secs).
Python version of lexer and parser created.
Python lexer run completed (0.56 secs).
Python parser run completed (0.59 secs).
Results for Python & Java lexers & parsers are identical - yay!
----------------------------------------------------------------------
Testing with Java grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-6alztuia
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Java lexer run completed (3.53 secs).
Java parser run completed (3.48 secs).
Python version of lexer and parser created.
Python lexer run completed (3.91 secs).
Python parser run completed (17.09 secs).
Results for Python & Java lexers & parsers are identical - yay!
----------------------------------------------------------------------
Testing with Python grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-1js74vz4
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Java lexer run completed (8.96 secs).
Java parser run completed (5.48 secs).
Python version of lexer and parser created.
Python lexer run completed (6.61 secs).
Python parser run completed (30.47 secs).
Results for Python & Java lexers & parsers are identical - yay!
Working directories deleted.

I also did a quick profiling exercise on the Python parser in Python. Here is where the time is spent:

         106220799 function calls (106185052 primitive calls) in 4.858 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 13054100    0.454    0.000    1.140    0.000 utils.py:46(_check_pos)
 13054100    0.331    0.000    1.574    0.000 utils.py:58(_idx_and_bit)
    54332    0.278    0.000    2.093    0.000 utils.py:142(next_set_bit)
 13054100    0.255    0.000    0.472    0.000 utils.py:33(_ints_needed)
 13020676    0.250    0.000    1.815    0.000 utils.py:88(__getitem__)
 13054100    0.217    0.000    0.217    0.000 {built-in function math.ceil}
     3797    0.216    0.000    2.811    0.001 lexer.py:7635(_next_token)
   122663    0.201    0.000    3.131    0.000 parser.py:380(next_token)
    94959    0.178    0.000    1.656    0.000 parser.py:26449(scan_token)
 13180660    0.119    0.000    0.119    0.000 {built-in function len}

As expected, a lot of this is in the lexer, and specifically in the BitSet implementation that I put in utils.py, which provides the java.util.BitSet features needed. My implementation has no upper limit on the size of the set; what is the best way to get the upper bound, i.e. the highest bit that will need to be set for a given grammar? Using that would potentially open the door to some optimisations while remaining a pure-Python solution - for example, preallocating the array holding the bits, removing or simplifying some run-time checks, etc.

It’s just the size of the nfaFunctions array. javacc21/LexerCode.java.ftl at master · javacc21/javacc21 · GitHub

If you have multiple lexical states, that varies depending on the lexical state you are in. Of course, globally, the maximum is just the number of NFA states (i.e. the size of that function array) for the lexical state with the largest number of NFA states.

Currently, the Python grammar is not using lexical states. If you look at the bottom of PythonNfaData.java, you can see that there are 678 NFA states. If your BitSet implementation is storing the entire bit vector in 64-bit longs, the way the Java BitSet does, that is 11 longs and you could just pre-allocate that, I guess.

OK, thanks, I guessed that. I’ve updated the templates to either use the one size (one lexical state) or the max of all the sizes (multiple lexical states).

Trust you had a good flight :slight_smile:

After doing a few tweaks to optimize the BitSet implementation, I shaved off some time from the Python runs:

Testing with JSON grammar

Working directory: /tmp/javacc-python-test-h7qfj40o
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Java lexer run completed (2.33 secs).
Java parser run completed (2.41 secs).
Python version of lexer and parser created.
Python lexer run completed (0.25 secs).
Python parser run completed (0.40 secs).
Results for Python & Java lexers & parsers are identical - yay!

Testing with Java grammar

Working directory: /tmp/javacc-python-test-e3w4m0gz
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Java lexer run completed (2.99 secs).
Java parser run completed (3.41 secs).
Python version of lexer and parser created.
Python lexer run completed (2.74 secs).
Python parser run completed (14.63 secs).
Results for Python & Java lexers & parsers are identical - yay!

Testing with Python grammar

Working directory: /tmp/javacc-python-test-cibrzfcq
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Java lexer run completed (3.15 secs).
Java parser run completed (4.06 secs).
Python version of lexer and parser created.
Python lexer run completed (2.88 secs).
Python parser run completed (22.96 secs).
Results for Python & Java lexers & parsers are identical - yay!
Working directories deleted.

Also, running the Python parser under the profiler to parse all the CPython and Django sources shows that the BitSet routines are now not the main place where the time is spent:

         4531486449 function calls (4437705025 primitive calls) in 1430.911 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
215656959  121.434    0.000  381.774    0.000 parser.py:26487(scan_token)
  7602072   92.675    0.000  398.547    0.000 lexer.py:7666(_next_token)
110494510   92.298    0.000  101.416    0.000 utils.py:150(fast_next_set_bit)
 30663300   68.538    0.000  110.118    0.000 parser.py:40174(open_node_scope)
 30663120   62.580    0.000  110.281    0.000 parser.py:40212(close_node_scope)
272810383   52.280    0.000  474.702    0.000 parser.py:418(next_token)
110494510   46.755    0.000   49.036    0.000 utils.py:53(set)
  9884555   25.521    0.000   38.431    0.000 lexer.py:20(NFA_COMPOSITE_PYTHON_0)
     4716   24.401    0.005   24.401    0.005 {method 'read' of '_io.BufferedReader' objects}
  8015950   23.819    0.000   27.395    0.000 tokens.py:379(new_token)
 51152847   23.288    0.000  144.303    0.000 parser.py:452(next_token_type)
  7602070   22.201    0.000   70.363    0.000 lexer.py:7794(instantiate_token)
 10881220   20.700    0.000   24.348    0.000 tokens.py:180(add_child)

Am I interpreting this data correctly? That the python lexing is a shade faster than Java??? Is that possible? Or maybe I’m misinterpreting the above…

If the python parser is only about 5x slower, that is really pretty okay. On the ANTLR list, they seem to frequently repeat a kind of ballpark figure that the parsers generated in python are 30x slower.

In other matters, the flight was okay insofar as it was on time and we got here. It was very uncomfortable because we had the last row of seats at the back that don’t lean back. I was terribly jet-lagged and was probably suffering some mild altitude sickness for the first couple of days after arrival. (Mexico city is at 2200m altitude.) I think that today is the first day I feel fully normal. But all is well.

Seems to be. The purpose of the tests is correctness, not as a benchmark: so for each grammar, starting with an empty work dir, we copy the grammar files and test files to it, generate the Java lexer and parser, run them over the test files to generate output into text files (list of tokens, parse tree). Then generate the Python lexer and parser, run them over the same test files, and generate output into text files and compare with the values from the Java run. If different, the differences are printed by diff, else we print the “yay” message.

I’m running this code under pypy3, which uses a tracing JIT that is pretty fast (though I believe it uses a lot more memory) - so I would expect hot loops to be converted to machine code after checking during warm up that only the same types are ever passed in to given functions/methods. No reason per se that the performance wouldn’t be as good as Java.

If they’re using just Python, that wouldn’t be surprising. If they use pypy3, then that might indicate that ANTLR-generated Python parsers are slow, whether due to code generation or the ANLTR runtime.

Good to hear. Just be careful out there:

The date says May but the page has an up-to-date chart as of today.

I just did a run on the same sources (Django and CPython) using the latest ANTLR (4.9.2) and their 2-year old Python 3 grammar from their repo, using the same test script I was using before, just substituting in the ANTLR parser API. Running under the same version of pypy3 as before, I get slower run and more errors:

ANTLR-generated parser in Python: 4610 successes, 106 failures, 2763.59 secs.
JavaCC21-generated parser in Python: 4699 successes, 17 failures, 586.09 secs.
JavaCC21-generated parser in Java: 4701 successes, 15 failures, 41.46 secs.

So on this decent-sized data set, the JavaCC21 Python parser is ~ 1/14th of the speed of the Java parser, but almost 5 x the speed of the ANTLR Python parser.

The two files that were successfully parsed by the Java parser but not the Python parser were

~/projects/cpython/Lib/test/bad_coding.py
~/projects/cpython/Lib/test/badsyntax_pep3120.py

In the first of these, Python raised an error because of an encoding with a typo - "uft-8":

Failed to parse /home/vinay/projects/cpython/Lib/test/bad_coding.py
Traceback (most recent call last):
  File "parsetest.py", line 30, in process
    parser = Parser(p)
  File "/disk2/projects/scratch/javacc/python/pythonparser/parser.py", line 349, in __init__
    stream_or_lexer = Lexer(input_source, stream_or_lexer)
  File "/disk2/projects/scratch/javacc/python/pythonparser/lexer.py", line 7728, in __init__
    self.input_stream = FileLineMap(input_source, stream, line, column)
  File "/disk2/projects/scratch/javacc/python/pythonparser/utils.py", line 204, in __init__
    stream_or_content = stream_or_content.decode(encoding)
LookupError: unknown encoding: uft-8

In the second, there was a badly-encoded file, not sure why the Java parser accepted it:

Failed to parse /home/vinay/projects/cpython/Lib/test/badsyntax_pep3120.py
Traceback (most recent call last):
  File "parsetest.py", line 30, in process
    parser = Parser(p)
  File "/disk2/projects/scratch/javacc/python/pythonparser/parser.py", line 349, in __init__
    stream_or_lexer = Lexer(input_source, stream_or_lexer)
  File "/disk2/projects/scratch/javacc/python/pythonparser/lexer.py", line 7728, in __init__
    self.input_stream = FileLineMap(input_source, stream, line, column)
  File "/disk2/projects/scratch/javacc/python/pythonparser/utils.py", line 204, in __init__
    stream_or_content = stream_or_content.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 8: invalid start byte

I checked, and the file appears to be encoded in ISO-8859, not utf-8, and no encoding declared in the file:

$ file /home/vinay/projects/cpython/Lib/test/badsyntax_pep3120.py
/home/vinay/projects/cpython/Lib/test/badsyntax_pep3120.py: ISO-8859 text

So, while the Java runtime might be more forgiving of encoding issues, these are valid failures in Python-land.

Hi, sorry to have gone silent for the past few days. To tell the truth, it was only on reading your last message that I understood that the benchmarks you were providing were based on running PyPy rather than the standard python implementation. Actually, I didn’t even know what PyPy was until I looked it up!

And, actually, the lexer component running at about the same speed (with Pypy) as the Java one, I think that’s quite interesting. After all, even just the lexer component on its own is quite separately useful! But what is the speed like using standard CPython, I wonder.

Now, as for these encoding issues, I have to admit that the Python parser I implemented does not even look for the encoding comment up top. I didn’t consider it very important, so I left that for later. Well, what is implemented is that it does look for the BOM (byte order mark) which is a way of indicating encoding that I think comes from Microsoft. That is at this point: javacc21/FileLineMap.java.ftl at master · javacc21/javacc21 · GitHub
I initially thought that this would be just for C# and maybe other Microsoft-sponsored languages, but then I just made the check for the byte-order-mark something it does for reading in any file. I mean, after all, the likelihood that the byte-order mark is present by accident is basically nil!

In any case, all this encoding stuff does not seem very important nowadays. AFAICT, modern systems just use UTF-8. Well, okay, for absolute correctness, we should check for the encoding comment when reading in a file as well. So, definitely feel free to implement that! It’s easy enough actually. It appears that the key info is here: PEP 263 -- Defining Python Source Code Encodings | Python.org

the first or second line must match the following regular expression:

 ^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

Well, remember they’re not really designed as benchmarks, and so shouldn’t be taken as such - they’re just a very rough guide to performance.

Well, I haven’t been running those tests under CPython (the standard Python implementation) for a while, assuming they’d run slowly. And I’ve been using profiling runs (separate - just running the Python parser over the CPython and Django sources) so see where the time is being spent, and tweaking here and there to speed up the hotspots. I just changed the scripts slightly to print the version of Python or Jython they’re running (the Jython tests just call into the Java lexer and parsers implementations, so it’s unlikely they add much overhead, though they will add a bit).

Current run using pypy3:

$ PY_DEBUG=1 pypy3 python_tests.py --langs java,json,python
----------------------------------------------------------------------
Testing with JSON grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-il8l37_c
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java lexer run completed (2.38 secs).
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java parser run completed (2.60 secs).
Python version of lexer and parser created.
Running under 3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29) [PyPy 7.3.1 with GCC 9.3.0]
Python lexer run completed (0.37 secs).
Running under 3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29) [PyPy 7.3.1 with GCC 9.3.0]
Python parser run completed (0.55 secs).
Results for Python & Java lexers & parsers are identical - yay!
----------------------------------------------------------------------
Testing with Java grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-wf8z_q0i
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java lexer run completed (5.53 secs).
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java parser run completed (6.72 secs).
Python version of lexer and parser created.
Running under 3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29) [PyPy 7.3.1 with GCC 9.3.0]
Python lexer run completed (3.12 secs).
Running under 3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29) [PyPy 7.3.1 with GCC 9.3.0]
Python parser run completed (16.96 secs).
Results for Python & Java lexers & parsers are identical - yay!
----------------------------------------------------------------------
Testing with Python grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-lto47jxj
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java lexer run completed (3.45 secs).
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java parser run completed (4.25 secs).
Python version of lexer and parser created.
Running under 3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29) [PyPy 7.3.1 with GCC 9.3.0]
Python lexer run completed (2.94 secs).
Running under 3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29) [PyPy 7.3.1 with GCC 9.3.0]
Python parser run completed (23.51 secs).
Results for Python & Java lexers & parsers are identical - yay!
Working directories deleted.

Current run under Python3:

$ PY_DEBUG=1 python3 python_tests.py --langs java,json,python
----------------------------------------------------------------------
Testing with JSON grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-blbbhrdk
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java lexer run completed (2.40 secs).
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java parser run completed (2.53 secs).
Python version of lexer and parser created.
Running under 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Python lexer run completed (0.18 secs).
Running under 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Python parser run completed (0.13 secs).
Results for Python & Java lexers & parsers are identical - yay!
----------------------------------------------------------------------
Testing with Java grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-q525d2gx
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java lexer run completed (3.08 secs).
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java parser run completed (3.71 secs).
Python version of lexer and parser created.
Running under 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Python lexer run completed (1.92 secs).
Running under 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Python parser run completed (12.87 secs).
Results for Python & Java lexers & parsers are identical - yay!
----------------------------------------------------------------------
Testing with Python grammar
----------------------------------------------------------------------
Working directory: /tmp/javacc-python-test-qmil_xfk
Test files copied to working directory.
Java version of lexer and parser created.
Java lexer and parser compiled.
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java lexer run completed (3.17 secs).
Running under 2.7.2 (v2.7.2:925a3cc3b49d, Mar 21 2020, 10:03:58) [OpenJDK 64-Bit Server VM (Ubuntu)]
Java parser run completed (3.89 secs).
Python version of lexer and parser created.
Running under 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Python lexer run completed (3.22 secs).
Running under 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Python parser run completed (28.06 secs).
Results for Python & Java lexers & parsers are identical - yay!
Working directories deleted.

So the 2.7.2 are the Java runs (under Jython), 3.8.10 is CPython and 3.6.9 is PyPy3. It seems odd that PyPy3 runs are sometimes apparently outperformed by CPython - I’ve not looked into that yet, but I haven’t spotted any obvious howlers in my scripts.

I’m doing that already for the files in the CPython/Django runs - if the encoding didn’t exist or the file didn’t decode according to the declared encoding, I don’t even show it to the generated parser anyway. The only ones that fail there are the “deliberate mistake” test files.

A better comparison of CPython with PyPy might come from the time to run over all the sources in Django and CPython. Here are the corresponding times:

4699 successes, 17 failures, 518.73 secs (PyPy3).
4699 successes, 17 failures, 1767.61 secs (CPython).

Also, I added a slightly modified PyTest.java (which has no Jython overhead) and this gives

4701 successes, 15 failures, 72.97 secs (Java).

The difference in successes/failures being due to how encoding differences are handled in Java vs. Python.

So speed of Python parser in Java = 7 x speed of Python parser with PyPy3 (approx)
and speed of Python parser with PyPy3 = 3.5 x speed of Python parser with CPython

I think Jython must be causing a lot more overhead in my earlier tests than I expected (but those weren’t benchmarks anyway, they were just checking for equal results).