» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
We're pleased to announce PyPy 2.0.1. This is a stable bugfix release over 2.0. You can download it here:
http://pypy.org/download.html
The fixes are mainly about fatal errors or crashes in our stdlib. See below for more details.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.0 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. Support for ARM is progressing but not bug-free yet.
Highlights
- fix an occasional crash in the JIT that ends in RPython Fatal error: NotImplementedError.
- id(x) is now always a positive number (except on int/float/long/complex). This fixes an issue in _sqlite.py (mostly for 32-bit Linux).
- fix crashes of callback-from-C-functions (with cffi) when used together with Stackless features, on asmgcc (i.e. Linux only). Now gevent should work better.
- work around an eventlet issue with socket._decref_socketios().
Cheers, arigo et. al. for the PyPy team
I've started to work on NumPyPy since the end of April and here is a short update :
- I implemented pickling support on ndarrays and dtypes, it will be compatible with numpy's pickling protocol when the "numpypy" module will be renamed to "numpy".
- I am now working on subarrays.
We're pleased to announce PyPy 2.0. This is a stable release that brings a swath of bugfixes, small performance improvements and compatibility fixes. PyPy 2.0 is a big step for us and we hope in the future we'll be able to provide stable releases more often.
You can download the PyPy 2.0 release here:
http://pypy.org/download.html
The two biggest changes since PyPy 1.9 are:
- stackless is now supported including greenlets, which means eventlet and gevent should work (but read below about gevent)
- PyPy now contains release 0.6 of cffi as a builtin module, which is preferred way of calling C from Python that works well on PyPy
If you're using PyPy for anything, it would help us immensely if you fill out the following survey: http://bit.ly/pypysurvey This is for the developers eyes and we will not make any information public without your agreement.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.0 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. Windows 64 work is still stalling, we would welcome a volunteer to handle that. ARM support is on the way, as you can see from the recently released alpha for ARM.
Highlights
- Stackless including greenlets should work. For gevent, you need to check out pypycore and use the pypy-hacks branch of gevent.
- cffi is now a module included with PyPy. (cffi also exists for CPython; the two versions should be fully compatible.) It is the preferred way of calling C from Python that works on PyPy.
- Callbacks from C are now JITted, which means XML parsing is much faster.
- A lot of speed improvements in various language corners, most of them small, but speeding up some particular corners a lot.
- The JIT was refactored to emit machine code which manipulates a "frame" that lives on the heap rather than on the stack. This is what makes Stackless work, and it could bring another future speed-up (not done yet).
- A lot of stability issues fixed.
- Refactoring much of the numpypy array classes, which resulted in removal of lazy expression evaluation. On the other hand, we now have more complete dtype support and support more array attributes.
Cheers,
fijal, arigo and the PyPy team
Hello.
We're pleased to announce an alpha release of PyPy 2.0 for ARM. This is mostly a technology preview, as we know the JIT is not yet stable enough for the full release. However please try your stuff on ARM and report back.
This is the first release that supports a range of ARM devices - anything with ARMv6 (like the Raspberry Pi) or ARMv7 (like Beagleboard, Chromebook, Cubieboard, etc.) that supports VFPv3 should work. We provide builds with support for both ARM EABI variants: hard-float and some older operating systems soft-float.
This release comes with a list of limitations, consider it alpha quality, not suitable for production:
- stackless support is missing.
- assembler produced is not always correct, but we successfully managed to run large parts of our extensive benchmark suite, so most stuff should work.
You can download the PyPy 2.0 alpha ARM release here (including a deb for raspbian):
http://pypy.org/download.html
Part of the work was sponsored by the Raspberry Pi foundation.
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast due to its integrated tracing JIT compiler.
This release supports ARM machines running Linux 32bit. Both hard-float armhf and soft-float armel builds are provided. armhf builds are created using the Raspberry Pi custom cross-compilation toolchain based on gcc-arm-linux-gnueabihf and should work on ARMv6 and ARMv7 devices running at least debian or ubuntu. armel builds are built using gcc-arm-linux-gnuebi toolchain provided by ubuntu and currently target ARMv7. If there is interest in other builds, such as gnueabi for ARMv6 or without requiring a VFP let us know in the comments or in IRC.
Benchmarks
Everybody loves benchmarks. Here is a table of our benchmark suite (for ARM we don't provide it yet on http://speed.pypy.org, unfortunately).
This is a comparison of Cortex A9 processor with 4M cache and Xeon W3580 with 8M of L3 cache. The set of benchmarks is a subset of what we run for http://speed.pypy.org that finishes in reasonable time. The ARM machine was provided by Calxeda. Columns are respectively:
- benchmark name
- PyPy speedup over CPython on ARM (Cortex A9)
- PyPy speedup over CPython on x86 (Xeon)
- speedup on Xeon vs Cortex A9, as measured on CPython
- speedup on Xeon vs Cortex A9, as measured on PyPy
- relative speedup (how much bigger the x86 speedup is over ARM speedup)
| Benchmark | PyPy vs CPython (arm) | PyPy vs CPython (x86) | x86 vs arm (pypy) | x86 vs arm (cpython) | relative speedup |
| ai | 3.61 | 3.16 | 7.70 | 8.82 | 0.87 |
| bm_mako | 3.41 | 2.11 | 8.56 | 13.82 | 0.62 |
| chaos | 21.82 | 17.80 | 6.93 | 8.50 | 0.82 |
| crypto_pyaes | 22.53 | 19.48 | 6.53 | 7.56 | 0.86 |
| django | 13.43 | 11.16 | 7.90 | 9.51 | 0.83 |
| eparse | 1.43 | 1.17 | 6.61 | 8.12 | 0.81 |
| fannkuch | 6.22 | 5.36 | 6.18 | 7.16 | 0.86 |
| float | 5.22 | 6.00 | 9.68 | 8.43 | 1.15 |
| go | 4.72 | 3.34 | 5.91 | 8.37 | 0.71 |
| hexiom2 | 8.70 | 7.00 | 7.69 | 9.56 | 0.80 |
| html5lib | 2.35 | 2.13 | 6.59 | 7.26 | 0.91 |
| json_bench | 1.12 | 0.93 | 7.19 | 8.68 | 0.83 |
| meteor-contest | 2.13 | 1.68 | 5.95 | 7.54 | 0.79 |
| nbody_modified | 8.19 | 7.78 | 6.08 | 6.40 | 0.95 |
| pidigits | 1.27 | 0.95 | 14.67 | 19.66 | 0.75 |
| pyflate-fast | 3.30 | 3.57 | 10.64 | 9.84 | 1.08 |
| raytrace-simple | 46.41 | 29.00 | 5.14 | 8.23 | 0.62 |
| richards | 31.48 | 28.51 | 6.95 | 7.68 | 0.91 |
| slowspitfire | 1.28 | 1.14 | 5.91 | 6.61 | 0.89 |
| spambayes | 1.93 | 1.27 | 4.15 | 6.30 | 0.66 |
| sphinx | 1.01 | 1.05 | 7.76 | 7.45 | 1.04 |
| spitfire | 1.55 | 1.58 | 5.62 | 5.49 | 1.02 |
| spitfire_cstringio | 9.61 | 5.74 | 5.43 | 9.09 | 0.60 |
| sympy_expand | 1.42 | 0.97 | 3.86 | 5.66 | 0.68 |
| sympy_integrate | 1.60 | 0.95 | 4.24 | 7.12 | 0.60 |
| sympy_str | 0.72 | 0.48 | 3.68 | 5.56 | 0.66 |
| sympy_sum | 1.99 | 1.19 | 3.83 | 6.38 | 0.60 |
| telco | 14.28 | 9.36 | 3.94 | 6.02 | 0.66 |
| twisted_iteration | 11.60 | 7.33 | 6.04 | 9.55 | 0.63 |
| twisted_names | 3.68 | 2.83 | 5.01 | 6.50 | 0.77 |
| twisted_pb | 4.94 | 3.02 | 5.10 | 8.34 | 0.61 |
It seems that Cortex A9, while significantly slower than Xeon, has higher slowdowns with a large interpreter (CPython) than a JIT compiler (PyPy). This comes as a surprise to me, especially that our ARM assembler is not nearly as polished as our x86 assembler. As for the causes, various people mentioned branch predictor, but I would not like to speculate without actually knowing.
How to use PyPy?
We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.
We would not recommend using in production PyPy on ARM just quite yet, however the day of a stable PyPy ARM release is not far off.
Cheers,
fijal, bivab, arigo and the whole PyPy team
We're pleased to announce the 2.0 beta 2 release of PyPy. This is a major release of PyPy and we're getting very close to 2.0 final, however it includes quite a few new features that require further testing. Please test and report issues, so we can have a rock-solid 2.0 final. It also includes a performance regression of about 5% compared to 2.0 beta 1 that we hope to fix before 2.0 final. The ARM support is not working yet and we're working hard to make it happen before the 2.0 final. The new major features are:
- JIT now supports stackless features, that is greenlets and stacklets. This means that JIT can now optimize the code that switches the context. It enables running eventlet and gevent on PyPy (although gevent requires some special support that's not quite finished, read below).
- This is the first PyPy release that includes cffi as a core library. Version 0.6 comes included in the PyPy library. cffi has seen a lot of adoption among library authors and we believe it's the best way to wrap C libaries. You can see examples of cffi usage in _curses.py and _sqlite3.py in the PyPy source code.
You can download the PyPy 2.0 beta 2 release here:
http://pypy.org/download.html
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast (pypy 2.0 beta 2 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. It also supports ARM machines running Linux, however this is disabled for the beta 2 release. Windows 64 work is still stalling, we would welcome a volunteer to handle that.
How to use PyPy?
We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.
Highlights
- cffi is officially supported by PyPy. It comes included in the standard library, just use import cffi
- stackless support - eventlet just works and gevent requires pypycore and pypy-hacks branch of gevent (which mostly disables cython-based modules)
- callbacks from C are now much faster. pyexpat is about 3x faster, cffi callbacks around the same
- __length_hint__ is implemented (PEP 424)
- a lot of numpy improvements
Improvements since 1.9
- JIT hooks are now a powerful tool to introspect the JITting process that PyPy performs
- various performance improvements compared to 1.9 and 2.0 beta 1
- operations on long objects are now as fast as in CPython (from roughly 2x slower)
- we now have special strategies for dict/set/list which contain unicode strings, which means that now such collections will be both faster and more compact.
Hello.
During the PyCon trip multiple people asked me how exactly they could run their stuff on PyPy to get the speedups. Now, in an ideal world, you would just swap CPython with PyPy, everything would run tons of times faster and everyone would live happily ever after. However, we don't live in an ideal world and PyPy does not speed up everything you could potentially run. Chances are that you can run your stuff quite a bit faster, but it requires quite a bit more R&D than just that. This blog post is an attempt to explain certain steps that might help. So here we go:
- Download and install PyPy. 2.0 beta 1 or upcoming 2.0 beta 2 would be a good candidate; it's not called a beta for stability reasons.
- Run your tests on PyPy. There is absolutely no need for fast software that does not work. There might be some failures. Usually they're harmless (e.g. you forgot to close the file); either fix them or at least inspect them. In short, make sure stuff works.
- Inspect your stack. In particular, C extensions, while sometimes working, are
a potential source of instability and slowness. Fortunately,
since the introduction of cffi, the ecosystem of PyPy-compatible software
has been growing. Things I know are written with PyPy in mind:
- the new version of pyOpenSSL will support PyPy via cffi
- psycopg2cffi is the most actively maintained postgres binding for PyPy, with pg8000 reported working
- mysql has a ctypes based implementation (although a cffi-based one would be definitely better)
- PyPy 2.0 beta 2 will come with sqlite-using-cffi
- lxml-cffi
- uWSGI, while working, is almost certainly not the best choice. Try tornado, twisted.web, cyclone.io, gunicorn or gevent (note: gevent support for PyPy is not quite finished; will write about it in a separate blog post, but you can't just use the main branch of gevent)
- consult (and contribute to) pypy compatibility wiki for details (note that it's community maintained, might be out of date)
- Have benchmarks. If you don't have benchmarks, then performance does not matter for you. Since PyPy's warm-up time is bad (and yes, we know, we're working on it), you should leave ample time for warm-ups. Five to ten seconds of continuous computation should be enough.
- Try them. If you get lucky, the next step might be to deploy and be happy. If you're unlucky, profile and try to isolate bottlenecks. They might be in a specific library or they might be in your code. The better you can isolate them, the higher your chances of understanding what's going on.
- Don't take it for granted. PyPy's JIT is very good, but there is a variety of reasons that it might not work how you expect it to. A lot of times it starts off slow, but a little optimization can improve the speed as much as 10x. Since PyPy's runtime is less mature than CPython, there are higher chances of finding an obscure corner of the standard library that might be atrociously slow.
- Most importantly, if you run out of options and you have a reproducible example, please report it. A pypy-dev email, popping into #pypy on irc.freenode.net, or getting hold of me on twitter are good ways. You can also contact me directly at fijall at gmail.com as well. While it's cool if the example is slow, a lot of problems only show up on large and convoluted examples. As long as I can reproduce it on my machine or I can log in somewhere, I am usually happy to help.
- I typically use a combination of jitviewer, valgrind and lsprofcalltree to try to guess what's going on. These tools are all useful, but use them with care. They usually require quite a bit of understanding before being useful. Also sometimes they're just plain useless and you need to write your own analysis.
I hope this summary of steps to take is useful. We hear a lot of stories of people trying PyPy, most of them positive, but some of them negative. If you just post "PyPy didn't work for me" on your blog, that's cool too, but you're missing an opportunity. The reasons may vary from something serious like "this is a bad pattern for PyPy GC" to something completely hilarious like "oh, I left this sys._getframe() somewhere in my hot loops for debugging" or "I used the logging module which uses sys._getframe() all over the place".
Cheers,
fijal
Hello, some good news!
First the update:
- dtype support - NumPy on PyPy now supports non-native storage formats. Due to a lack of true support for longdoubles in rpython, we decided to back out the support of longdouble-as-double which was misleading.
- missing ndarray attributes - work has been made toward supporting the complete set of attributes on ndarrays. We are progressing alphabetically, and have made it to d. Unsupported attributes, and unsupported arguments to attribute calls will raise a NotImplementedError.
- pickling support for numarray - hasn't started yet, but next on the list
- There has been some work on exposing FFI routines in numpypy.
- Brian Kearns has made progress in improving the numpypy namespace. The python numpypy submodules now more closely resemble their numpy counterparts. Also, translated _numpypy submodules are now more properly mapped to the numpy core c-based submodules, furthering the goal of being able to install numpy as a pure-python module with few modifications.
And now the good news:
While our funding drive over 2012 did not reach our goal, we still managed to raise a fair amount of money in donations. So far we only managed to spend around $10 000 of it. We issued a call for additional developers, and are glad to welcome Romain Guillebert and Ronan Lamy to the numpypy team. Hopefully we will be able to report on speedier progress soon.
Cheers,
Matti Picus, Maciej Fijalkowski
The following example is on CPython, not PyPy, but moving a third (after Reflex and CINT) backend into place underneath cppyy is straightforward compared to developing the backend in the first place. Take this snippet of C++11 code (cpp11.C):
constexpr int data_size() { return 5; }
auto N = data_size();
template<class L, class R>
struct MyMath {
static auto add(L l, R r) -> decltype(l+r) { return l + r; }
};
template class MyMath<int, int>;
As a practical matter, most usage of new C++11 features will live in implementations, not in declarations, and are thus never seen by the bindings. The above example is therefore somewhat contrived, but it will serve to show that these new declarations actually work. The new features used here are constexpr, auto, and decltype. Here is how you could use these from CPython, using the PyROOT package, which has more than a passing resemblance to cppyy, as one is based on the other:
import ROOT as gbl
gbl.gROOT.LoadMacro('cpp11.C')
print 'N =', gbl.N
print '1+1 =', gbl.MyMath(int, int).add(1,1)
which, when entered into a file
(cpp11.py) and executed,
prints the expected results:
$ python cpp11.py
N = 5
1+1 = 2
In the example, the C++ code is compiled on-the-fly, rather than first generating
a dictionary as is needed with Reflex.
A deployment model that utilizes stored pre-compiled information is foreseen
to work with larger projects, which may have to pull in headers from many places.
Work is going to continue first on C++03 on cling with CPython (about 85% of unit tests currently pass), with a bit of work on C++11 support on the side. Once fully in place, it can be brought into a new backend for cppyy, after which the remaining parts of C++11 can be fleshed out for both interpreters.
Cheers,
Wim Lavrijsen
This is the tenth status update about our work on the py3k branch, which we
can work on thanks to all of the people who donated to the py3k proposal.
There's been significant progress since the last update: the linux x86-32
buildbot now passes 289 out of approximately 354 modules (with 39 skips) of
CPython's regression test suite.
That means there's only 26 test module failures left! The list of major items
remaining for 3.2 compatibility are now short enough to list here, with their
related tests:
- Tokenizer support for non-ascii identifiers
- test_importlib
- test_pep263
- memoryview (Manuel Jacob's tackling this on the py3k-memoryview branch)
- test_memoryview
- multiprocessing module currently deadlocks
- test_multiprocessing
- Buggy handling of the new extended unpacking syntax by the compiler:
- test_unpack_ex
- The new Global Interpreter Lock and new thread signal handling
- test_threading
- test_threadsignals
- test_sys
- Upgrade unicodedata to 6.0.0 (requires updates to the actual unicodedata
generation script)
- test_ucn
- test_unicode
- test_unicodedata
- test_capi (currently crashes)
- Update int's hash code to match to CPython (float's is already updated on the
py3k-newhash branch. note that PyPy 2.x doesn't even totally match
CPython's hashing)
- test_decimal
- test_fractions
- test_numeric_tower
- Miscellaneous:
- test_complex
- test_float
- test_peepholer
- test_range
- test_sqlite (a new cffi based version seems to be coming)
- test_ssl
- test_struct
- test_subprocess
- test_sys_settrace
- test_time
Additionally there are still a number of failures in PyPy's internal test
suite. These tests are usually ran against untranslated versions of PyPy during
development. However we've now began running them against a fully translated
version of PyPy on the buildbot too (thanks to Amaury for setting this
up). This further ensures that our tests and implementation are sane.
We're getting closer to producing an initial alpha release. Before that happens
we'd like to see:
- further test fixes
- the results of test runs on other major platforms (e.g. linux x86-64 and osx
seem to have some additional failures as of now) - some basic real world testing
Finally I'd like to thank Manuel Jacob for his various contributions over the
past month, including fixing the array and ctypes modules among other things,
and also Amaury Forgeot d'Arc for his ongoing excellent contributions.
cheers,
Phil
From a software engineering perspective, 10 years is indistinguishable from infinity, so I don't care what happens 10 years from now -- as long as you don't blame me. :-)
- Guido van Rossum, Python creator.10 years is indeed a long time. PyPy was created approximately 10 years ago, with the exact date being lost in the annals of the version control system. We've come a long way during those 10 years, from a "minimal Python" that was supposed to serve mostly as an educational tool, through to a vehicle for academic research to a high performance VM for Python and beyond.
Some facts from the PyPy timeline:
- In 2007, at the end of the EU funding period, we promised the JIT was just around the corner. It turned out we misjudged it pretty badly -- the first usable PyPy was released in 2010.
- At some point we decided to have a JavaScript backend so one could compile RPython programs to JavaScript and run them in a browser. Turned out it was a horrible idea.
- Another option we tried was using RPython to write CPython C extensions. Again, it turned out RPython is a bad language and instead we made a fast JIT, so you don't have to write C extensions.
- We made N attempts to use LLVM. Seriously, N is 4 or 5. But we haven't fully given up yet :-) They all run into issues one way or another.
- We were huge fans of ctypes at the beginning. Up to the point where we tried to make a restricted subset with static types, called rctypes for RPython. Turned out to be horrible. Twice.
- We were very hopeful about creating a JIT generator from the beginning. But the first one failed miserably, generating too much assembler. The second failed too. The third first burned down and then failed. However, we managed to release a working JIT in 2010, against all odds.
- Martijn Faassen used to ask us "how fast is PyPy" so we decided to name an option enabling all optimizations "--faassen". Then "--no-faassen" was naturally added too. Later we decided to grow up and renamed it to "-O2", and now "-Ojit".
- The first time the Python interpreter successfully compiled to C, it segfaulted because the code generator used signed chars instead of unsigned chars...
- To make it more likely to be accepted, the proposal for the EU project contained basically every feature under the sun a language could have. This proved to be annoying, because we had to actually implement all that stuff. Then we had to do a cleanup sprint where we deleted 30% of codebase and 70% of features.
- At one sprint someone proposed a new software development methodology: 'Terminology-Driven Programming' means to pick a fancy name, then discuss what it could mean, then implement it. Examples: timeshifter, rainbow interpreter, meta-space bubble, hint annotations (all but one of these really existed).
- There is a conspiracy theory that the reason why translation is so slow is because time is stored away during it, which is later retrieved when an actual program runs to make them appear faster
Overall, it was a really long road. However, 10 years later we are in good shape. A quick look on the immediate future: we are approaching PyPy 2.0 with stackless+JIT and cffi support, the support for Python 3 is taking shape, non-standard extensions like STM are slowly getting ready (more soon), and there are several non-Python interpreters around the corner (Hippy, Topaz and more).
Cheers,
fijal, arigo, hodgestar, cfbolz and the entire pypy team.
Hello everyone.
We (Armin Rigo and Maciej Fijalkowski) are visiting San Francisco/Silicon Valley for PyCon and beyond. Alex Gaynor, another core PyPy dev is living there permanently. My visiting dates are 12-28 of March, Armin's 11-21st. If you want us to give a talk at your company or simply catch up with us for a dinner please get in touch. Write to pypy-dev@python.org, if you want this publically known or simply send me a mail at fijall@gmail.com if you don't want it public.
Cheers,
fijal
Hello everyone
Last week, Alex Gaynor announced the first public release of Topaz, a Ruby interpreter written in RPython. This is the culmination of a part-time effort over the past 10 months to provide a Ruby interpreter that implements enough interesting constructs in Ruby to show that the RPython toolchain can produce a Ruby implementation fast enough to beat what is out there.
Disclaimer
Obviously the implementation is very incomplete currently in terms of available standard library. We are working on getting it useable. If you want to try it, grab a nightly build.
We have run some benchmarks from the Ruby benchmark suite and the metatracing VMs experiment. The preliminary results are promising, but at this point we are missing so many method implementations that most benchmarks won't run yet. So instead of performance, I'm going to talk about the high-level structure of the implementation.
Architecture
Topaz interprets a custom bytecode set. The basics are similar to Smalltalk VMs, with bytecodes for loading and storing locals and instance variables, sending messages, and stack management. Some syntactical features of Ruby, such as defining classes and modules, literal regular expressions, hashes, ranges, etc also have their own bytecodes. The third kind of bytecodes are for control flow constructs in Ruby, such as loops, exception handling, break, continue, etc.
In trying to get from Ruby source code to bytecode, we found that the easiest way to support all of the Ruby syntax is to write a custom lexer and use an RPython port of PLY (fittingly called RPly) to create the parser from the Ruby yacc grammar.
The Topaz interpreter uses an ObjectSpace (similar to how PyPy does
it), to interact with the Ruby world. The object space contains all
the logic for wrapping and interacting with Ruby objects from the
VM. It's __init__ method sets up the core classes, initial globals,
and creates the main thread (the only one right now, as we do not have
threading, yet).
Classes are mostly written in Python. We use ClassDef objects to define the Ruby hierarchy and attach RPython methods to Ruby via ClassDef decorators. These two points warrant a little explanation.
Hierarchies
All Ruby classes ultimately inherit from BasicObject. However, most
objects are below Object (which is a direct subclass of
BasicObject). This includes objects of type Fixnum, Float,
Class, and Module, which may not need all of the facilities of
full objects most of the time.
Most VMs treat such objects specially, using tagged pointers to represent Fixnums, for example. Other VMs (for example from the SOM Family) don't. In the latter case, the implementation hierarchy matches the language hierarchy, which means that objects like Fixnum share a representation with all other objects (e.g. they have class pointers and some kind of instance variable storage).
In Topaz, implementation hierarchy and language hierarchy are
separate. The first is defined through the Python inheritance. The
other is defined through the ClassDef for each Python class, where the
appropriate Ruby superclass is chosen. The diagram below shows how the
implementation class W_FixnumObject inherits directly from
W_RootObject. Note that W_RootObject doesn't have any attrs,
specifically no storage for instance variables and no map (for
determining the class - we'll get to that). These attributes are
instead defined on W_Object, which is what most other implementation
classes inherit from. However, on the Ruby side, Fixnum correctly
inherits (via Numeric and Integer) from Object.
This simple structural optimization gives a huge speed boost, but there are VMs out there that do not have it and suffer performance hits for it.
Decorators
Ruby methods can have symbols in its names that are not allowed as part of Python method names, for example !, ?, or =, so we cannot simply define Python methods and expose them to Ruby by the same name.
For defining the Ruby method name of a function, as well as argument number checking, Ruby type coercion and unwrapping of Ruby objects to their Python equivalents, we use decorators defined on ClassDef. When the ObjectSpace initializes, it builds all Ruby classes from their respective ClassDef objects. For each method in an implementation class that has a ClassDef decorator, a wrapper method is generated and exposed to Ruby. These wrappers define the name of the Ruby method, coerce Ruby arguments, and unwrap them for the Python method.
Here is a simple example:
@classdef.method("*", times="int")
def method_times(self, space, times):
return self.strategy.mul(space, self.str_storage, times)
This defines the method * on the Ruby String class. When this is
called, the first argument is converted into a Ruby Fixnum object
using the appropriate coercion method, and then unwrapped into a plain
Python int and passed as argument to method_times. The wrapper
method also supplies the space argument.
Object Structure
Ruby objects have dynamically defined instance variables and may change their class at any time in the program (a concept called singleton class in Ruby - it allows each object to have unique behaviour). To still efficiently access instance variables, you want to avoid dictionary lookups and let the JIT know about objects of the same class that have the same instance variables. Topaz, like PyPy (which got it from Self), implements instances using maps, which transforms dictionary lookups into array accesses. See the blog post for the details.
This is only a rough overview of the architecture. If you're interested, get in touch on #topaz.freenode.net, follow the Topaz Twitter account or contribute on GitHub.
Tim FelgentreffHi all,
A short notice to tell you that CFFI 0.5 was released. This contains a number of small improvements from 0.4, but seems to otherwise be quite stable since a couple of months --- no change since January 10, apart from the usual last-minute fixes for Python 3 and for Windows.
Have fun!
Armin
Introduction
Proposed herein is a part-time fellowship for developing NumPy in PyPy. The work will initially consist of 100 hours with the possibility of extension, until the funds run out. Development and improvement of PyPy's NumPyPy (as with most Open Source and Free Software) is done as a collaborative process between volunteer, paid, and academic contributors. Due to a successful funding drive but a lack of contributors willing to work directly for PyPy, we find ourselves in the enviable situation of being able to offer this position.
Background
PyPy's developers make all PyPy software available to the public without charge, under PyPy's Open Source copyright license, the permissive MIT License. PyPy's license assures that PyPy is equally available to everyone freely on terms that allow both non-commercial and commercial activity. This license allows for academics, for-profit software developers, volunteers and enthusiasts alike to collaborate together to make a better Python implementation for everyone.
NumPy support for PyPy is licensed similarly, and therefore NumPy in PyPy support can directly help researchers and developers who seek to do numeric computing but want an easier programming language to use than Fortan or C, which is typically used for these applications. Being licensed freely to the general public means that opportunities to use, improve and learn about how NumPy in PyPy works itself will be generally available to everyone.
The Need for a Part-Time Developer
NumPy project in PyPy has seen some slow, but steady progress since we started working about a year ago. On one hand, it's actually impressive what we could deliver with the effort undertaken, on the other hand, we would like to see the development accelerated.
PyPy has strict coding, testing, documentation, and review standards, which ensures excellent code quality, continually improving documentation and code test coverage, and minimal regressions. A part-time developer will be able to bring us closer to the goal of full numpy-api implementation and speed improvements.
Work Plan
The current proposal is split into two parts:
Compatibility:
This part covers the core NumPy Python API. We'll implement most NumPy APIs that are officially documented and we'll pass most of NumPy's tests that cover documented APIs and are not implementation details. Specifically, we don't plan to:
- implement NumPy's C API
- implement other scientific libraries, like SciPy, matplotlib or biopython
- implement details that are otherwise agreed by consensus to not have a place in PyPy's implementation of NumPy or agreed with NumPy community to be implementation details
Speed:
This part will cover significant speed improvements in the JIT that would make numeric computations faster. This includes, but is not necesarilly limited to:
- write a set of benchmarks covering various use cases
- teaching the JIT backend (or multiple backends) how to deal with vector operations, like SSE
- experiments with automatic parallelization using multiple threads, akin to numexpr
- improving the JIT register allocator that will make a difference, especially for tight loops
As with all speed improvements, it's relatively hard to predict exactly how it'll cope, however we expect the results to be withing an order of magnitude of handwritten C equivalent.
Position Candidate
We would like people who are proficient in NumPy and PyPy (but don't have to be core developers of either) to step up. The developer selection will be done by consensus of PyPy core developers and consulted with the Software Freedom Conservancy for lack of conflict of interest. The main criterium will be past contributions to the PyPy project, but they don't have to be significant in size.
A candidate for the Developer position will demonstrate the following:
- The ability to write clear, stable, suitable and tested code
- The ability to understand and extend the JIT capabilities used in NumPyPy.
- A positive presence in PyPy's online community on IRC and the mailing list.
Ideally the Developer will also:
- Have familiarity with the infrastructure of the PyPy project (including bug tracker and buildbot).
- Have Worked to provide education or outreach on PyPy in other forums such as workshops, conferences, and user groups.
Conservancy and PyPy are excited to announce the Developer Position. Renumeration for the position will be at the rate of 60 USD per hour, through the Software Freedom Conservancy.
PyPy community is promising to provide necessary guidance and help into the current codebase, however we expect a successful candidate to be able to review code and incorporate external patches within two months of the starting date of the contract.
Candidates should submit their proposal (including their CV) to:
The deadline for this initial round of proposals is February 1, 2013.
This is the ninth status update about our work on the py3k branch, which
we can work on thanks to all of the people who donated to the py3k
proposal.
Just a very short update on December's work: we're now passing about 223 of
approximately 355 modules of CPython's regression test suite, up from passing
194 last month.
Some brief highlights:
- More encoding related issues were addressed. e.g. now most if not all the
multibytecodec test modules pass. - Fixed some path handling issues (test_os, test_ntpath and
test_posixpath now pass) - We now pass test_class, test_descr and almost test_builtin (among
other things): these are notable as they are fairly extensive test suites of
core aspects of the langauge. - Amaury Forgeot d'Arc continued making progress on CPyExt (thanks again!)
cheers,
Phil
Hello everyone
I would like to advertise a PyPy-related summer internship at the National Center for Atmospheric Research, which is located in lovely Boulder, Colorado. As for the last year, the mentor will be Davide del Vento, with my possible support on the PyPy side.
The full details of the application are to be found on the internship description and make sure you read the requirements first. Important requirements:
- Must currently be enrolled in a United States university.
- Only students authorized to work for any employer in the United States will be considered for the SIParCS program.
- Must be a graduate or under graduate who has completed their sophomore year.
If you happen to fulfill the requirements, to me this sounds like a great opportunity to spend a summer at NCAR in Boulder hacking on atmospheric models using PyPy.
Cheers, fijal
This is the eight status update about our work on the py3k branch, which
we can work on thanks to all of the people who donated to the py3k
proposal.
Just a short update on November's work: we're now passing about 194 of
approximately 355 modules of CPython's regression test suite, up from passing
160 last month. Many test modules only fail a small number of individual tests
now.
We'd like to thank Amaury Forgeot d'Arc for his contributions, in particular he
has made significant progress on updating CPyExt for Python 3 this month.
Some other highlights:
- test_marshal now passes, and there's been significant progress on
pickling (thanks Kenny Levinsen and Amaury for implementing
int.{to,from}_bytes) - We now have a _posixsubprocess module
- More encoding related fixes, which affects many failing tests
- _sre was updated and now test_re almost passes
- Exception behavior is almost complete per the Python 3 specs, what's mostly
missing now are the new __context__ and __traceback__ attributes (PEP
3134) - Fixed some crashes and deadlocks occurring during the regression tests
- We merged the unicode-strategies branch both to default and to py3k: now we
have versions of lists, dictionaries and sets specialized for unicode
elements, as we already had for strings. - However, for string-specialized containers are still faster in some cases
because there are shortcuts which have not been implemented for unicode yet
(e.g., constructing a set of strings from a list of strings). The plan is to
completely kill the shortcuts and improve the JIT to produce the fast
version automatically for both the string and unicode versions, to have a
more maintainable codebase without sacrificing the speed. The autoreds
branch (already merged) was a first step in this direction.
cheers,
Philip&Antonio
The next PyPy sprint will be in San Francisco, California. It is a
public sprint, suitable for newcomers. It will run on Saturday December 1st and
Sunday December 2nd. The goals for the sprint are continued work towards the
2.0 release as well as code cleanup, we of course welcome any topic which
contributors are interested in working on.
Some other possible topics are:
- running your software on PyPy
- work on PyPy's numpy (status)
- work on STM (status)
- JIT improvements
- any exciting stuff you can think of
If there are newcomers, we'll run the usual introduction to hacking on
PyPy.
Location
The sprint will be held at the Rackspace Office:
620 Folsom St, Ste 100
San Francisco
The doors will open at 10AM both days, and run until 6PM both days.
Thanks to David Reid for helping get everything set up!
We're pleased to announce the 2.0 beta 1 release of PyPy. This release is not a typical beta, in a sense the stability is the same or better than 1.9 and can be used in production. It does however include a few performance regressions documented below that don't allow us to label is as 2.0 final. (It also contains many performance improvements.)
The main features of this release are support for ARM processor and compatibility with CFFI. It also includes numerous improvements to the numpy in pypy effort, cpyext and performance.
You can download the PyPy 2.0 beta 1 release here:
http://pypy.org/download.html
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.3. It's fast (pypy 2.0 beta 1 and cpython 2.7.3 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows 32. It also supports ARM machines running Linux. Windows 64 work is still stalling, we would welcome a volunteer to handle that.
How to use PyPy?
We suggest using PyPy from a virtualenv. Once you have a virtualenv installed, you can follow instructions from pypy documentation on how to proceed. This document also covers other installation schemes.
Regressions
Reasons why this is not PyPy 2.0:
- the ctypes fast path is now slower than it used to be. In PyPy 1.9 ctypes was either incredibly faster or slower than CPython depending whether you hit the fast path or not. Right now it's usually simply slower. We're probably going to rewrite ctypes using cffi, which will make it universally faster.
- cffi (an alternative to interfacing with C code) is very fast, but it is missing one optimization that will make it as fast as a native call from C.
- numpypy lazy computation was disabled for the sake of simplicity. We should reenable this for the final 2.0 release.
Highlights
- cffi is officially supported by PyPy. You can install it normally by using pip install cffi once you have installed PyPy and pip. The corresponding 0.4 version of cffi has been released.
- ARM is now an officially supported processor architecture. PyPy now work on soft-float ARM/Linux builds. Currently ARM processors supporting the ARMv7 and later ISA that include a floating-point unit are supported.
- This release contains the latest Python standard library 2.7.3 and is fully compatible with Python 2.7.3.
- It does not however contain hash randomization, since the solution present in CPython is not solving the problem anyway. The reason can be found on the CPython issue tracker.
- gc.get_referrers() is now faster.
- Various numpy improvements. The list includes:
- axis argument support in many places
- full support for fancy indexing
- complex128 and complex64 dtypes
- JIT hooks are now a powerful tool to introspect the JITting process that PyPy performs.
- **kwds usage is much faster in the typical scenario
- operations on long objects are now as fast as in CPython (from roughly 2x slower)
- We now have special strategies for dict/set/list which contain unicode strings, which means that now such collections will be both faster and more compact.
Things we're working on
There are a few things that did not make it to the 2.0 beta 1, which are being actively worked on. Greenlets support in the JIT is one that we would like to have before 2.0 final. Two important items that will not make it to 2.0, but are being actively worked on, are:
- Faster JIT warmup time.
- Software Transactional Memory.
Cheers,
Maciej Fijalkowski, Armin Rigo and the PyPy team
This is the seventh status update about our work on the py3k branch, which
we can work on thanks to all of the people who donated to the py3k
proposal.
The biggest news is that this month Philip started to work on py3k in parallel
to Antonio. As such, there was an increased amount of activity.
The py3k buildbots now fully translate the branch every night and run the
Python standard library tests.
We currently pass 160 out of approximately 355 modules of CPython's standard
test suite, fail 144 and skip approximately 51.
Some highlights:
- dictviews (the objects returned by dict.keys/values/items) has been greatly
improved, and now they full support set operators - a lot of tests has been fixed wrt complex numbers (and in particular the
__complex__ method) - _csv has been fixed and now it correctly handles unicode instead of bytes
- more parser fixes, py3k list comprehension semantics; now you can no longer
access the list comprehension variable after it finishes - 2to3'd most of the lib_pypy modules (pypy's custom standard lib
replacements/additions) - py3-enabled pyrepl: this means that finally readline works at the command
prompt, as well as builtins.input(). pdb seems to work, as well as
fancycompleter to get colorful TAB completions :-) - py3 round
- further tightening/cleanup of the unicode handling (more usage of
surrogateescape, surrogatepass among other things) - as well as keeping up with some big changes happening on the default branch
and of course various other fixes.
Finally, we would like to thank Amaury Forgeot d'Arc for his significant
contributions.
cheers,
Philip&Antonio








