PyPy is a reimplementation of Python in Python, using advanced techniques to try to attain better performance than CPython. Many years of hard work have finally paid off. Our speed results often beat CPython, ranging from being slightly slower, to speedups of up to 2x on real application code, to speedups of up to 10x on small benchmarks. This post describes what we did on PyPy during the last year, leading up to those results.
In the spring of 2009 we completed an update of PyPy to support Python version 2.5 with much appreciated financial support from Google. Most of the work on the update was done by Maciej Fijalkowski and Samuele Pedroni. While this work was in progress, Armin Rigo and Carl Friedrich Bolz were hard at work rebuilding the framework of the Just In Time compiler (JIT). The old framework, using techniques based on Partial Evaluation, only gave good results in constrained cases, but it would usually generate far too much code for Python. It was time to do more research from scratch. What we discovered was that the techniques typical of Tracing JIT compilers better suit the optimization of dynamic languages than techniques based on Partial Evaluation. However, we still follow our original meta-programming approach and remain convinced that writing a JIT compiler generator is more promising than the typical hand-coding of a JIT compiler for a particular interpreter. In other words, like in the original attempt, we get a JIT compiler that is not tied to Python but is generated from the source code of any interpreter for any dynamic language. If you are interested in the details, there is a very approachable paper about it, along with various blog posts.
During the autumn we applied and refined the JIT framework, added more optimisations and wrote a code generator for the x86 CPU family. In the early stages we could get good speed only at the price of huge memory consumption, but much work was spent addressing this problem. We have now reached a point where memory consumption is usually reasonable, but it is the nature of JIT compilers to trade some memory for speed. We began work using the benchmarks from The Great Computer Language Benchmarks Game to identify some problem areas. We have another blog post about our work in this area for the curious. Thanks go to Andrew Mahone who ported many of the Alioth benchmarks for our use. We also did some work on benchmarks that behave more like actual applications. The Django templating engine is twice as fast, and Twisted benchmarks are up to 2.85 as fast. For details, see our progress reports from the blog. While the results on a substantial number of benchmarks are really good, there is a lot more to do. We still have spots where performance is fairly bad, for instance our regular expression engine and the handling of generators. We have ideas about how to improve this and we have a list of further optimisations that could be performed.
The largest issue preventing users from adopting PyPy is the lack of extension modules. In addition to his constant efforts in making sure that PyPy runs on the Windows platform, Amaury Forgeot d'Arc has managed to port Oracle bindings to PyPy, while Alexander Schremmer has worked out a way to use the Remote Procedure Call library RPyC to use CPython extension modules with PyPy. Alexanders' goal was to get PyQt to run with PyPy and he was quite successful (apart from some bugs with PyQt itself), which you can read about on our blog. We also had Benjamin Peterson single-handledly rewrite our previously slow and problem-ridden Python parser, which now is much leaner and meaner. It is beta software, in the sense that it may speed up your applications significantly or not at all. We will need your help finding the odd quirks that prevent your Python programs from running and help to spot the places where performance can be improved. More information can be found on the PyPy blog, at our website or on the #pypy IRC channel at freenode.net.