• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)

Date: Monday, 08 Sep 2014 23:03
We're pleased to announce the availability of PyPy 2.4-beta1; faster, fewer bugs, and updated to the python 2.7.8 stdlib.

This release contains several bugfixes and enhancements. Among the user-facing improvements:
  • internal refactoring in string and GIL handling which led to significant speedups
  • improved handling of multiple objects (like sockets) in long-running programs. They are collected and released more efficiently, reducing memory use. In simpler terms - we closed what looked like a memory leak
  • Windows builds now link statically to zlib, expat, bzip, and openssl-1.0.1i
  • Many issues were resolved since the 2.3.1 release in June

You can download the PyPy 2.4-beta1 release here http://pypy.org/download.html.

We would like to also point out that in September, the Python Software Foundation will match funds for any donations up to $10k, so head over to our website and help this mostly-volunteer effort out.

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7 and 3.2.5. It's fast (pypy 2.4 and cpython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows, and OpenBSD, as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux. 
We would like to thank our donors for the continued support of the PyPy project.

The complete release notice is here.

Please try it out and let us know what you think. We especially welcome success stories, please tell us about how it has helped you!

Cheers, The PyPy Team

News Flash from the beta release cycle:
  • Note that the beta release mistakenly identifies itself in sys.pypy_version_info as releaselevel=='final', please do not mistake this for a final version
  • The beta can hit a "Illegal instruction" exception in jitted code on ARMv6 processors like the RaspberryPi. This will be fixed for the release.

Author: "mattip (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Saturday, 06 Sep 2014 19:00

We're extremely excited to announce that for the month of September, any amount
you donate to PyPy will be match (up to $10,000) by the Python Software

This includes any of our ongoing fundraisers: NumPyPy, STM, Python3, or our
general fundraising.

Here are some of the things your previous donations have helped accomplish:

  • Getting PyPy3 completed (currently 3.2, with 3.3 work underway)
  • New research and production engineering on STM for PyPy
  • Lots of progress on NumPy for PyPy
  • Significant performance improvements

You can see a preview of what's coming in our next 2.4 release in the draft
release notes

Thank you to all the individuals and companies which have donated so far.

So please, donate today: http://pypy.org/

(Please be aware that the donation progress bars are not live updating, so
don't be afraid if your donation doesn't show up immediately).

Author: "Alex (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 11 Aug 2014 11:57

Extending the Smalltalk RSqueakVM with STM

by Conrad Calmez, Hubert Hesse, Patrick Rein and Malte Swart supervised by Tim Felgentreff and Tobias Pape


After pypy-stm we can announce that through the RSqueakVM (which used to be called SPyVM) a second VM implementation supports software transactional memory. RSqueakVM is a Smalltalk implementation based on the RPython toolchain. We have added STM support based on the STM tools from RPython (rstm). The benchmarks indicate that linear scale up is possible, however in some situations the STM overhead limits speedup.

The work was done as a master's project at the Software Architechture Group of Professor Robert Hirschfeld at at the Hasso Plattner Institut at the University of Potsdam. We - four students - worked about one and a half days per week for four months on the topic. The RSqueakVM was originally developped during a sprint at the University of Bern. When we started the project we were new to the topic of building VMs / interpreters.

We would like to thank Armin, Remi and the #pypy IRC channel who supported us over the course of our project.

Introduction to RSqueakVM

As the original Smalltalk implementation, the RSqueakVM executes a given Squeak Smalltalk image, containing the Smalltalk code and a snapshot of formerly created objects and active execution contexts. These execution contexts are scheduled inside the image (greenlets) and not mapped to OS threads. Thereby the non-STM RSqueakVM runs on only one OS thread.

Changes to RSqueakVM

The core adjustments to support STM were inside the VM and transparent from the view of a Smalltalk user. Additionally we added Smalltalk code to influence the behavior of the STM. As the RSqueakVM has run in one OS thread so far, we added the capability to start OS threads. Essentially, we added an additional way to launch a new Smalltalk execution context (thread). But in contrast to the original one this one creates a new native OS thread, not a Smalltalk internal green thread.

STM (with automatic transaction boundaries) already solves the problem of concurrent access on one value as this is protected by the STM transactions (to be more precise one instruction). But there are cases were the application relies on the fact that a bigger group of changes is executed either completely or not at all (atomic). Without further information transaction borders could be in the middle of such a set of atomic statements. rstm allows to aggregate multiple statements into one higher level transaction. To let the application mark the beginning and the end of these atomic blocks (high-level transactions), we added two more STM specific extensions to Smalltalk.


RSqueak was executed in a single OS thread so far. rstm enables us to execute the VM using several OS threads. Using OS threads we expected a speed-up in benchmarks which use multiple threads. We measured this speed-up by using two benchmarks: a simple parallel summation where each thread sums up a predefined interval and an implementation of Mandelbrot where each thread computes a range of predefined lines.

To assess the speed-up, we used one RSqueakVM compiled with rstm enabled, but once running the benchmarks with OS threads and once with Smalltalk green threads. The workload always remained the same and only the number of threads increased. To assess the overhead imposed by the STM transformation we also ran the green threads version on an unmodified RSqueakVM. All VMs were translated with the JIT optimization and all benchmarks were run once before the measurement to warm up the JIT. As the JIT optimization is working it is likely to be adoped by VM creators (the baseline RSqueakVM did that) so that results with this optimization are more relevant in practice than those without it. We measured the execution time by getting the system time in Squeak. The results are:

Parallel Sum Ten Million

Benchmark Parallel Sum 10,000,000
Thread Count RSqueak green threads RSqueak/STM green threads RSqueak/STM OS threads Slow down from RSqueak green threads to RSqueak/STM green threads Speed up from RSqueak/STM green threads to RSQueak/STM OS Threads
1 168.0 ms 240.0 ms 290.9 ms 0.70 0.83
2 167.0 ms 244.0 ms 246.1 ms 0.68 0.99
4 167.8 ms 240.7 ms 366.7 ms 0.70 0.66
8 168.1 ms 241.1 ms 757.0 ms 0.70 0.32
16 168.5 ms 244.5 ms 1460.0 ms 0.69 0.17

Parallel Sum One Billion

Benchmark Parallel Sum 1,000,000,000

Thread CountRSqueak green threadsRSqueak/STM green threadsRSqueak/STM OS threadsSlow down from RSqueak green threads to RSqueak/STM green threadsSpeed up from RSqueak/STM green threads to RSQueak/STM OS Threads
1 16831.0 ms 24111.0 ms 23346.0 ms 0.70 1.03
2 17059.9 ms 24229.4 ms 16102.1 ms 0.70 1.50
4 16959.9 ms 24365.6 ms 12099.5 ms 0.70 2.01
8 16758.4 ms 24228.1 ms 14076.9 ms 0.69 1.72
16 16748.7 ms 24266.6 ms 55502.9 ms 0.69 0.44

Mandelbrot Iterative

Benchmark Mandelbrot
Thread Count RSqueak green threads RSqueak/STM green threads RSqueak/STM OS threads Slow down from RSqueak green threads to RSqueak/STM green threads Speed up from RSqueak/STM green threads to RSqueak/STM OS Threads
1 724.0 ms 983.0 ms 1565.5 ms 0.74 0.63
2 780.5 ms 973.5 ms 5555.0 ms 0.80 0.18
4 781.0 ms 982.5 ms 20107.5 ms 0.79 0.05
8 779.5 ms 980.0 ms 113067.0 ms 0.80 0.01

Discussion of benchmark results

First of all, the ParallelSum benchmarks show that the parallelism is actually paying off, at least for sufficiently large embarrassingly parallel problems. Thus RSqueak can also benefit from rstm.

On the other hand, our Mandelbrot implementation shows the limits of our current rstm integration. We implemented two versions of the algorithm one using one low-level array and one using two nested collections. In both versions, one job only calculates a distinct range of rows and both lead to a slowdown. The summary of the state of rstm transactions shows that there are a lot of inevitable transactions (transactions which must be completed). One reason might be the interactions between the VM and its low-level extensions, so called plugins. We have to investigate this further.


Although the current VM setup is working well enough to support our benchmarks, the VM still has limitations. First of all, as it is based on rstm, it has the current limitation of only running on 64-bit Linux.

Besides this, we also have two major limitations regarding the VM itself. First, the atomic interface exposed in Smalltalk is currently not working, when the VM is compiled using the just-in-time compiler transformation. Simple examples such as concurrent parallel sum work fine while more complex benchmarks such as chameneos fail. The reasons for this are currently beyond our understanding. Second, Smalltalk supports green threads, which are threads which are managed by the VM and are not mapped to OS threads. We currently support starting new Smalltalk threads as OS threads instead of starting them as green threads. However, existing threads in a Smalltalk image are not migrated to OS threads, but remain running as green threads.

Future work for STM in RSqueak

The work we presented showed interesting problems, we propose the following problem statements for further analysis:
  • Inevitable transactions in benchmarks. This looks like it could limit other applications too so it should be solved.
  • Collection implementation aware of STM: The current implementation of collections can cause a lot of STM collisions due to their internal memory structure. We believe it could bear potential for performance improvements, if we replace these collections in an STM enabled interpreter with implementations with less STM collisions. As already proposed by Remi Meier, bags, sets and lists are of particular interest.
  • Finally, we exposed STM through languages features such as the atomic method, which is provided through the VM. Originally, it was possible to model STM transactions barriers implicitly by using clever locks, now its exposed via the atomic keyword. From a language design point of view, the question arises whether this is a good solution and what features an stm-enabled interpreter must provide to the user in general? Of particular interest are for example, access to the transaction length and hints for transaction borders to and their performance impact.

    Details for the technically inclined

    • Adjustments to the interpreter loop were minimal.
    • STM works on bytecode granularity that means, there is a implicit transaction border after every bytecode executed. Possible alternatives: only break transactions after certain bytecodes, break transactions on one abstraction layer above, e.g. object methods (setter, getter).
    • rstm calls were exposed using primtives (a way to expose native code in Smalltalk), this was mainly used for atomic.
    • Starting and stopping OS threads is exposed via primitives as well. Threads are started from within the interpreter.
    • For Smalltalk enabled STM code we currently have different image versions. However another way to add, load and replace code to the Smalltalk code base is required to make a switch between STM and non-STM code simple.

      Details on the project setup

      From a non-technical perspective, a problem we encountered was the huge roundtrip times (on our machines up to 600s, 900s with JIT enabled). This led to a tendency of bigger code changes ("Before we compile, let's also add this"), lost flow ("What where we doing before?") and different compiled interpreters in parallel testing ("How is this version different from the others?") As a consequence it was harder to test and correct errors. While this is not as much of a problem for other RPython VMs, RSqueakVM needs to execute the entire image, which makes running it untranslated even slower.


      The benchmarks show that speed up is possible, but also that the STM overhead in some situations can eat up the speedup. The resulting STM-enabled VM still has some limitations: As rstm is currently only running on 64-bit Linux the RSqueakVM is doing so as well. Eventhough it is possible for us now to create new threads that map to OS threads within the VM, the migration of exiting Smalltalk threads keeps being problematic.

      We showed that an existing VM code base can benefit of STM in terms of scaling up. Further it was relatively easy to enable STM support. This may also be valuable to VM developers considering to get STM support for their VMs.

      Author: "Carl Friedrich Bolz (noreply@blogger.com)" Tags: "Smalltalk, Squeak, stm"
      Send by mail Print  Save  Delicious 
      Date: Tuesday, 08 Jul 2014 12:38

      Hi all,

      PyPy-STM is now reaching a point where we can say it's good enough to be a GIL-less Python. (We don't guarantee there are no more bugs, so please report them :-) The first official STM release:

      This corresponds roughly to PyPy 2.3 (not 2.3.1). It requires 64-bit Linux. More precisely, this release is built for Ubuntu 12.04 to 14.04; you can also rebuild it from source by getting the branch stmgc-c7. You need clang to compile, and you need a patched version of llvm.

      This version's performance can reasonably be compared with a regular PyPy, where both include the JIT. Thanks for following the meandering progress of PyPy-STM over the past three years --- we're finally getting somewhere really interesting! We cannot thank enough all contributors to the previous PyPy-STM money pot that made this possible. And, although this blog post is focused on the results from that period of time, I have of course to remind you that we're running a second call for donation for future work, which I will briefly mention again later.

      A recap of what we did to get there: around the start of the year we found a new model, a "redo-log"-based STM which uses a couple of hardware tricks to not require chasing pointers, giving it (in this context) exceptionally cheap read barriers. This idea was developed over the following months and (relatively) easily integrated with the JIT compiler. The most recent improvements on the Garbage Collection side are closing the gap with a regular PyPy (there is still a bit more to do there). There is some preliminary user documentation.

      Today, the result of this is a PyPy-STM that is capable of running pure Python code on multiple threads in parallel, as we will show in the benchmarks that follow. A quick warning: this is only about pure Python code. We didn't try so far to optimize the case where most of the time is spent in external libraries, or even manipulating "raw" memory like array.array or numpy arrays. To some extent there is no point because the approach of CPython works well for this case, i.e. releasing the GIL around the long-running operations in C. Of course it would be nice if such cases worked as well in PyPy-STM --- which they do to some extent; but checking and optimizing that is future work.

      As a starting point for our benchmarks, when running code that only uses one thread, we get a slow-down between 1.2 and 3: at worst, three times as slow; at best only 20% slower than a regular PyPy. This worst case has been brought down --it used to be 10x-- by recent work on "card marking", a useful GC technique that is also present in the regular PyPy (and about which I don't find any blog post; maybe we should write one :-) The main remaining issue is fork(), or any function that creates subprocesses: it works, but is very slow. To remind you of this fact, it prints a line to stderr when used.

      Now the real main part: when you run multithreaded code, it scales very nicely with two threads, and less-than-linearly but still not badly with three or four threads. Here is an artificial example:

          total = 0
          lst1 = ["foo"]
          for i in range(100000000):
              total += lst1.pop()

      We run this code N times, once in each of N threads (full benchmark). Run times, best of three:

      Number of threads Regular PyPy (head) PyPy-STM
      N = 1 real 0.92s
      user+sys 0.92s
      real 1.34s
      user+sys 1.34s
      N = 2 real 1.77s
      user+sys 1.74s
      real 1.39s
      user+sys 2.47s
      N = 3 real 2.57s
      user+sys 2.56s
      real 1.58s
      user+sys 4.106s
      N = 4 real 3.38s
      user+sys 3.38s
      real 1.64s
      user+sys 5.35s

      (The "real" time is the wall clock time. The "user+sys" time is the recorded CPU time, which can be larger than the wall clock time if multiple CPUs run in parallel. This was run on a 4x2 cores machine. For direct comparison, avoid loops that are so trivial that the JIT can remove all allocations from them: right now PyPy-STM does not handle this case well. It has to force a dummy allocation in such loops, which makes minor collections occur much more frequently.)

      Four threads is the limit so far: only four threads can be executed in parallel. Similarly, the memory usage is limited to 2.5 GB of GC objects. These two limitations are not hard to increase, but at least increasing the memory limit requires fighting against more LLVM bugs. (Include here snark remarks about LLVM.)

      Here are some measurements from more real-world benchmarks. This time, the amount of work is fixed and we parallelize it on T threads. The first benchmark is just running translate.py on a trunk PyPy. The last three benchmarks are here.

      Benchmark PyPy 2.3 (PyPy head) PyPy-STM, T=1 T=2 T=3 T=4
      translate.py --no-allworkingmodules
      (annotation step)
      184s (170s) 386s (2.10x) n/a
      5000 iterations
      24.2s (16.8s) 52.5s (2.17x) 37.4s (1.55x) 25.9s (1.07x) 32.7s (1.35x)
      divided in 16-18 bands
      22.9s (18.2s) 27.5s (1.20x) 14.4s (0.63x) 10.3s (0.45x) 8.71s (0.38x)
      btree 2.26s (2.00s) 2.01s (0.89x) 2.22s (0.98x) 2.14s (0.95x) 2.42s (1.07x)

      This shows various cases that can occur:

      • The mandelbrot example runs with minimal overhead and very good parallelization. It's dividing the plane to compute in bands, and each of the T threads receives the same number of bands.
      • Richards, a classical benchmark for PyPy (tweaked to run the iterations in multiple threads), is hard to beat on regular PyPy: we suspect that the difference is due to the fact that a lot of paths through the loops don't allocate, triggering the issue already explained above. Moreover, the speed of Richards was again improved dramatically recently, in trunk.
      • The translation benchmark measures the time translate.py takes to run the first phase only, "annotation" (for now it consumes too much memory to run translate.py to the end). Moreover the timing starts only after the large number of subprocesses spawned at the beginning (mostly gcc). This benchmark is not parallel, but we include it for reference here. The slow-down factor of 2.1x is still too much, but we have some idea about the reasons: most likely, again the Garbage Collector, missing the regular PyPy's very fast small-object allocator for old objects. Also, translate.py is an example of application that could, with reasonable efforts, be made largely parallel in the future using atomic blocks.
      • Atomic blocks are also present in the btree benchmark. I'm not completely sure but it seems that, in this case, the atomic blocks create too many conflicts between the threads for actual parallization: the base time is very good, but running more threads does not help at all.

      As a summary, PyPy-STM looks already useful to run CPU-bound multithreaded applications. We are certainly still going to fight slow-downs, but it seems that there are cases where 2 threads are enough to outperform a regular PyPy, by a large margin. Please try it out on your own small examples!

      And, at the same time, please don't attempt to retrofit threads inside an existing large program just to benefit from PyPy-STM! Our goal is not to send everyone down the obscure route of multithreaded programming and its dark traps. We are going finally to shift our main focus on the phase 2 of our research (donations welcome): how to enable a better way of writing multi-core programs. The starting point is to fix and test atomic blocks. Then we will have to debug common causes of conflicts and fix them or work around them; and try to see how common frameworks like Twisted can be adapted.

      Lots of work ahead, but lots of work behind too :-)

      Armin (thanks Remi as well for the work).

      Author: "Armin Rigo (noreply@blogger.com)" Tags: "stm"
      Send by mail Print  Save  Delicious 
      Date: Friday, 20 Jun 2014 23:31

      We're pleased to announce the first stable release of PyPy3. PyPy3
      targets Python 3 (3.2.5) compatibility.

      We would like to thank all of the people who donated to the py3k proposal
      for supporting the work that went into this.

      You can download the PyPy3 2.3.1 release here:



      • The first stable release of PyPy3: support for Python 3!
      • The stdlib has been updated to Python 3.2.5
      • Additional support for the u'unicode' syntax (PEP 414) from Python 3.3
      • Updates from the default branch, such as incremental GC and various JIT
      • Resolved some notable JIT performance regressions from PyPy2:
      • Re-enabled the previously disabled collection (list/dict/set) strategies
      • Resolved performance of iteration over range objects
      • Resolved handling of Python 3's exception __context__ unnecessarily forcing
        frame object overhead

      What is PyPy?

      PyPy is a very compliant Python interpreter, almost a drop-in replacement for
      CPython 2.7.6 or 3.2.5. It's fast due to its integrated tracing JIT compiler.

      This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows,
      and OpenBSD,
      as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux.

      While we support 32 bit python on Windows, work on the native Windows 64
      bit python is still stalling, we would welcome a volunteer
      to handle that.

      How to use PyPy?

      We suggest using PyPy from a virtualenv. Once you have a virtualenv
      installed, you can follow instructions from pypy documentation on how
      to proceed. This document also covers other installation schemes.

      the PyPy team

      Author: "Philip Jenvey (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Sunday, 08 Jun 2014 01:14
      We're pleased to announce PyPy 2.3.1, a feature-and-bugfix improvement over our recent 2.3 release last month.

      This release contains several bugfixes and enhancements among the user-facing improvements:
      • The built-in struct module was renamed to _struct, solving issues with IDLE and other modules
      • Support for compilation with gcc-4.9
      • A CFFI-based version of the gdbm module is now included in our binary bundle
      • Many issues were resolved since the 2.3 release on May 8

      You can download the PyPy 2.3.1 release here:


      PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.3.1 and cpython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

      This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows, and OpenBSD, as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux. 
      We would like to thank our donors for the continued support of the PyPy project.

      The complete release notice is here.

      Please try it out and let us know what you think. We especially welcome success stories, please tell us about how it has helped you!

      Cheers, The PyPy Team
      Author: "mattip (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Friday, 09 May 2014 10:38
      We’re pleased to announce PyPy 2.3, which targets version 2.7.6 of the Python language. This release updates the stdlib from 2.7.3, jumping directly to 2.7.6.

      This release also contains several bugfixes and performance improvements, many generated by real users finding corner cases. CFFI has made it easier than ever to use existing C code with both cpython and PyPy, easing the transition for packages like cryptographyPillow(Python Imaging Library [Fork]), a basic port of pygame-cffi, and others.

      PyPy can now be embedded in a hosting application, for instance inside uWSGI

      You can download the PyPy 2.3 release here:


      PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.3 and cpython 2.7.x performance comparison; note that cpython's speed has not changed since 2.7.2) due to its integrated tracing JIT compiler.

      This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows, and OpenBSD, as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux. 

      We would like to thank our donors for the continued support of the PyPy project.

      The complete release notice is here

      Cheers, The PyPy Team
      Author: "mattip (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Tuesday, 15 Apr 2014 22:08
      Work on NumPy on PyPy continued in March, though at a lighter pace than the previous few months. Progress was made on both compatibility and speed fronts. Several behavioral issues reported to the bug tracker were resolved. The most significant of these was probably the correction of casting to built-in Python types. Previously, int/long conversions of numpy scalars such as inf/nan/1e100 would return bogus results. Now, they raise or return values, as appropriate.

      On the speed front, enhancements to the PyPy JIT were made to support virtualizing the raw_store/raw_load memory operations used in numpy arrays. Further work remains here in virtualizing the alloc_raw_storage when possible. This will allow scalars to have storages but still be virtualized when possible in loops.

      Aside from continued work on compatibility/speed of existing code, we also hope to begin implementing the C-level components of other numpy modules such as mtrand, nditer, linalg, and so on. Several approaches could be taken to get C-level code in these modules working, ranging from reimplementing in RPython to interfacing with existing code with CFFI, if possible. The appropriate approach depends on many factors and will probably vary from module to module.

      To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.
      Author: "Brian Kearns (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Wednesday, 09 Apr 2014 11:33

      Hi all,

      We now have a preliminary version of PyPy-STM with the JIT, from the new STM documentation page. This PyPy-STM is still not quite useful, failing to top the performance of a regular PyPy by a small margin on most benchmarks, but it's definitely getting there :-) The overheads with the JIT are still a bit too high. (I've been tracking an obscure bug since days. It turned out to be a simple buffer overflow. But if anybody has a clue about why a hardware watchpoint in gdb, set on one of the garbled memory locations, fails to trigger but the memory ends up being modified anyway... and, it turns out, by just a regular pointer write... ideas welcome.)

      But I go off-topic :-) The main point of this post is to announce the 2nd Call for Donation about STM. We achieved most of the goals laid out in the first call. We even largely overachieved them in terms of raw performance, even if there are many cases that are unreasonably slow for now. So, after the successful research, we are launching a second proposal about the development part of the project:

      1. Polish PyPy-STM to get a consistently reasonable speed, 25%-40% slower than a regular JITted PyPy when running single-threaded code. Of course it is supposed to scale nicely as long as there are no user-visible conflicts.

      2. Focus on developing the Python-facing interface: both internal things (e.g. do dictionaries need to be more TM-friendly in general?) as well as directly visible things (e.g. some profiler-like interface to explore common conflicts in a program).

      3. Regular multithreaded code should benefit out of the box, but the final goal is to explore and tweak some existing non-multithreaded frameworks and improve their TM-friendliness. So existing programs using Twisted or Stackless, for example, should run on multiple cores without any major change.

      See the full call for more details! I'd like to thank Remi Meier for getting involved. And a big thank you to everybody who contributed money on the first call. It took more time than anticipated, but it's there in good but rough shape. Now it needs a lot of polishing :-)


      Author: "Armin Rigo (noreply@blogger.com)" Tags: "stm"
      Send by mail Print  Save  Delicious 
      Date: Tuesday, 08 Apr 2014 20:44

      Hi all,

      Here is one of the first full PyPy's (edit: it was r69967+, but the general list of versions is currently here) compiled with the new StmGC-c7 library. It has no JIT so far, but it runs some small single-threaded benchmarks by taking around 40% more time than a corresponding non-STM, no-JIT version of PyPy. It scales --- up to two threads only, which is the hard-coded maximum so far in the c7 code. But the scaling looks perfect in these small benchmarks without conflict: starting two threads each running a copy of the benchmark takes almost exactly the same amount of total time, simply using two cores.

      Feel free to try it! It is not actually useful so far, because it is limited to two cores and CPython is something like 2.5x faster. One of the important next steps is to re-enable the JIT. Based on our current understanding of the "40%" figure, we can probably reduce it with enough efforts; but also, the JIT should be able to easily produce machine code that suffers a bit less than the interpreter from these effects. This seems to mean that we're looking at 20%-ish slow-downs for the future PyPy-STM-JIT.

      Interesting times :-)

      For reference, this is what you get by downloading the PyPy binary linked above: a Linux 64 binary (Ubuntu 12.04) that should behave mostly like a regular PyPy. (One main missing feature is that destructors are never called.) It uses two cores, but obviously only if the Python program you run is multithreaded. The only new built-in feature is with __pypy__.thread.atomic: this gives you a way to enforce that a block of code runs "atomically", which means without any operation from any other thread randomly interleaved.

      If you want to translate it yourself, you need a trunk version of clang with three patches applied. That's the number of bugs that we couldn't find workarounds for, not the total number of bugs we found by (ab)using the address_space feature...

      Stay tuned for more!

      Armin & Remi

      Author: "Armin Rigo (noreply@blogger.com)" Tags: "stm"
      Send by mail Print  Save  Delicious 
      Date: Monday, 31 Mar 2014 00:18
      Here is what has been happening with NumPy in PyPy in October thanks to the people who donated to the NumPyPy proposal:

      The biggest change is that we shifted to using an external fork of numpy rather than a minimal numpypy module. The idea is that we will be able to reuse most of the upstream pure-python numpy components, replacing the C modules with appropriate RPython micronumpy pieces at the correct places in the module namespace.

      The numpy fork should work just as well as the old numpypy for functionality that existed previously, and also include much new functionality from the pure-python numpy pieces that simply hadn't been imported yet in numpypy. However, this new functionality will not have been "hand picked" to only include pieces that work, so you may run into functionality that relies on unimplemented components (which should fail with user-level exceptions).

      This setup also allows us to run the entire numpy test suite, which will help in directing future compatibility development. The recent PyPy release includes these changes, so download it and let us know how it works! And if you want to live on the edge, the nightly includes even more numpy progress made in November.

      To install the fork, download the latest release, and then install numpy either separately with a virtualenv: pip install git+https://bitbucket.org/pypy/numpy.git; or directly: git clone https://bitbucket.org/pypy/numpy.git; cd numpy; pypy setup.py install.

      EDIT: if you install numpy as root, you may need to also import it once as root before it works: sudo pypy -c 'import numpy'

      Along with this change, progress was made in fixing internal micronumpy bugs and increasing compatibility:
      • Fixed a bug with strings in record dtypes
      • Fixed a bug where the multiplication of an ndarray with a Python int or float resulted in loss of the array's dtype
      • Fixed several segfaults encountered in the numpy test suite (suite should run now without segfaulting)

      We also began working on __array_prepare__ and __array_wrap__, which are necessary pieces for a working matplotlib module.

      Romain and Brian
      Author: "Romain Guillebert (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Wednesday, 26 Mar 2014 17:28

      The Raspberry Pi aims to be a low-cost educational tool that anyone can use to learn about electronics and programming. Python and pygame are included in the Pi's programming toolkit. And since last year, thanks in part to sponsorship from the Raspberry Pi Foundation, PyPy also works on the Pi (read more here).

      With PyPy working on the Pi, game logic written in Python stands to gain an awesome performance boost. However, the original pygame is a Python C extension. This means it performs poorly on PyPy and negates any speedup in the Python parts of the game code.

      One solution to making pygame games run faster on PyPy, and eventually on the Raspberry Pi, comes in the form of pygame_cffi. pygame_cffi uses CFFI to wrap the underlying SDL library instead of a C extension. A few months ago, the Raspberry Pi Foundation sponsored a Cape Town Python User Group hackathon to build a proof-of-concept pygame using CFFI. This hackathon was a success and it produced an early working version of pygame_cffi.

      So for the last 5 weeks Raspberry Pi has been funding work on pygame_cffi. The goal was a complete implementation of the core modules. We also wanted benchmarks to illuminate performance differences between pygame_cffi on PyPy and pygame on CPython. We are happy to report that those goals were met. So without further ado, here's a rundown of what works.

      Current functionality

      Invention screenshot:
      Mutable mamba screenshot:

      With the above-mentioned functionality in place we could get 10+ of the pygame examples to work, and a number of PyWeek games. At the time of writing, if a game doesn't work it is most likely due to an unimplemented transform or draw function. That will be remedied soon.


      In terms of performance, pygame_cffi on PyPy is showing a lot of promise. It beats pygame on CPython by a significant margin in our events processing and collision detection benchmarks, while blit and fill benchmarks perform similarly. The pygame examples we checked also perform better.

      However, there is still work to be done to identify and eliminate bottlenecks. On the Raspberry Pi performance is markedly worse compared to pygame (barring collision detection). The PyWeek games we tested also performed slightly worse. Fortunately there is room for improvement in various places.

      Invention & Mutable Mamba (x86)
      Standard pygame examples (Raspberry Pi)

      Here's a summary of some of the benchmarks. Relative speed refers to the frame rate obtained in pygame_cffi on PyPy relative to pygame on CPython.

      Benchmark Relative speed (pypy speedup)
      Events (x86) 1.41
      Events (Pi) 0.58
      N2 collision detection on 100 sprites (x86) 4.14
      N2 collision detection on 100 sprites (Pi) 1.01
      Blit 100 surfaces (x86) 1.06
      Blit 100 surfaces (Pi) 0.60
      Invention (x86) 0.95
      Mutable Mamba (x86) 0.72
      stars example (x86) 1.95
      stars example (Pi) 0.84


      Some not-so-great news is that PyOpenGL performs poorly on PyPy since PyOpenGL uses ctypes. This translates into a nasty reduction in frame rate for games that use OpenGL surfaces. It might be worthwhile creating a CFFI-powered version of PyOpenGL as well.

      Where to now?

      Work on pygame_cffi is ongoing. Here are some things that are in the pipeline:

      • Get pygame_cffi on PyPy to a place where it is consistently faster than pygame on CPython.
      • Implement the remaining modules and functions, starting with draw and transform.
      • Improve test coverage.
      • Reduce the time it takes for CFFI to parse the cdef. This makes the initial pygame import slow.

      If you want to contribute you can find pygame_cffi on Github. Feel free to find us on #pypy on freenode or post issues on github.

      Rizmari Versfeld

      Author: "Maciej Fijalkowski (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Wednesday, 12 Mar 2014 11:28
      Hello everyone

      There is an interview with Roberto De Ioris (from uWSGI fame) about embedding PyPy in uWSGI that covers recent addition of a PyPy embedding interface using cffi and the experience with using it. Read The full interview

      Author: "Maciej Fijalkowski (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Monday, 10 Mar 2014 19:38
      More progress was made on the NumPy front in the past month. On the compatibility front, we now pass ~130 more tests from NumPy's suite since the end of January. Currently, we pass 2336 tests out of 3265 tests run, with many of the failures representing portions of NumPy that we don't plan to implement in the near future (object dtypes, unicode, etc). There are still some failures that do represent issues, such as special indexing cases and failures to respect subclassed ndarrays in return values, which we do plan to resolve. There are also some unimplemented components and ufuncs remaining which we hope to implement, such as nditer and mtrand. Overall, the most common array functionality should be working.

      Additionally, I began to take a look at some of the loops generated by our code. One widely used loop is dot, and we were running about 5x slower than NumPy's C version. I was able to optimize the dot loop and also the general array iterator to get us to ~1.5x NumPy C time on dot operations of various sizes. Further progress in this area could be made by using CFFI to tie into BLAS libraries, when available. Also, work remains in examining traces generated for our other loops and checking for potential optimizations.

      To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.

      Author: "Brian Kearns (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Tuesday, 18 Feb 2014 03:33

      This is the 13th status update about our work on the py3k branch, which we
      can work on thanks to all of the people who donated to the py3k proposal.

      We're just finishing up a cleanup of int/long types. This work helps the py3k
      branch unify these types into the Python 3 int and restore JIT compilation of
      machine sized integers

      This cleanup also removes multimethods from these types. PyPy has
      historically used a clever implementation of multimethod dispatch for declaring
      methods of the __builtin__ types in RPython.

      This multimethod scheme provides some convenient features for doing this,
      however we've come to the conclusion that it may be more trouble than it's
      worth. A major problem of multimethods is that they generate a large amount of
      stub methods which burden the already lengthy and memory hungry RPython
      translation process. Also, their implementation and behavior can be somewhat

      The alternative to multimethods involves doing the work of the type checking
      and dispatching rules in a more verbose, manual way. It's a little more work in
      the end but less magical.

      Recently, Manuel Jacob finished a large cleanup effort of the
      unicode/string/bytearray types that also removed their multimethods. This work
      also benefits the py3k branch: it'll help with future PEP 393 (or PEP 393
      ) work. This effort was partly sponsored by Google's Summer of
      Code: thanks Manuel and Google!

      Now there's only a couple major pieces left in the multimethod removal (the
      float/complex types and special marshaling code) and a few minor pieces that
      should be relatively easy.

      In conclusion, there's been some good progress made on py3k and multimethod
      removal this winter, albeit a bit slower than we would have liked.


      Author: "Philip Jenvey (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Tuesday, 11 Feb 2014 03:02
      Work continued on the NumPy + PyPy front steadily in December and more lightly in January. The continued focus was compatibility, targeting incorrect or unimplemented features that appeared in multiple NumPy test suite failures. We now pass ~2/3 of the NumPy test suite. The biggest improvements were made in these areas:

      - Bugs in conversions of arrays/scalars to/from native types
      - Fix cases where we would choose incorrect dtypes when initializing or computing results
      - Improve handling of subclasses of ndarray through computations
      - Support some optional arguments for array methods that are used in the pure-python part of NumPy
      - Support additional attributes in arrays, array.flags, and dtypes
      - Fix some indexing corner cases that arise in NumPy testing
      - Implemented part of numpy.fft (cffti and cfftf)

      Looking forward, we plan to continue improving the correctness of the existing implemented NumPy functionality, while also beginning to look at performance. The initial focus for performance will be to look at areas where we are significantly worse than CPython+NumPy. Those interested in trying these improvements out will need a PyPy nightly, and an install of the PyPy NumPy fork. Thanks again to the NumPy on PyPy donors for funding this work.
      Author: "Brian Kearns (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Sunday, 09 Feb 2014 23:16

      Hi all,

      A quick note about the Software Transactional Memory (STM) front.

      Since the previous post, we believe we progressed a lot by discovering an alternative core model for software transactions. Why do I say "believe"? It's because it means again that we have to rewrite from scratch the C library handling STM. This is currently work in progress. Once this is done, we should be able to adapt the existing pypy-stm to run on top of it without much rewriting efforts; in fact it should simplify the difficult issues we ran into for the JIT. So while this is basically yet another restart similar to last June's, the difference is that the work that we have already put in the PyPy part (as opposed to the C library) remains.

      You can read about the basic ideas of this new C library here. It is still STM-only, not HTM, but because it doesn't constantly move objects around in memory, it would be easier to adapt an HTM version. There are even potential ideas about a hybrid TM, like using HTM but only to speed up the commits. It is based on a Linux-only system call, remap_file_pages() (poll: who heard about it before? :-). As previously, the work is done by Remi Meier and myself.

      Currently, the C library is incomplete, but early experiments show good results in running duhton, the interpreter for a minimal language created for the purpose of testing STM. Good results means we brough down the slow-downs from 60-80% (previous version) to around 15% (current version). This number measures the slow-down from the non-STM-enabled to the STM-enabled version, on one CPU core; of course, the idea is that the STM version scales up when using more than one core.

      This means that we are looking forward to a result that is much better than originally predicted. The pypy-stm has chances to run at a one-thread speed that is only "n%" slower than the regular pypy-jit, for a value of "n" that is optimistically 15 --- but more likely some number around 25 or 50. This is seriously better than the original estimate, which was "between 2x and 5x". It would mean that using pypy-stm is quite worthwhile even with just two cores.

      More updates later...


      Author: "Armin Rigo (noreply@blogger.com)" Tags: "stm"
      Send by mail Print  Save  Delicious 
      Date: Tuesday, 10 Dec 2013 16:48
      Since the PyPy 2.2 release last month, more progress has been made on the NumPy compatibility front. Initial work has been directed by running the NumPy test suite and targeting failures that appear most frequently, along with fixing the few bugs reported on the bug tracker.

      Improvements were made in these areas:
      - Many missing/broken scalar functionalities were added/fixed. The scalar API should match up more closely with arrays now.
      - Some missing dtype functionality was added (newbyteorder, hasobject, descr, etc)
      - Support for optional arguments (axis, order) was added to some ndarray functions
      - Fixed some corner cases for string/record types

      Most of these improvements went onto trunk after 2.2 was split, so if you're interested in trying them out or running into problems on 2.2, try the nightly.

      Thanks again to the NumPy on PyPy donors who make this continued progress possible.

      Author: "Brian Kearns (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Monday, 09 Dec 2013 13:32

      One of the RaspberryPi's goals is to be a fun toolkit for school children (and adults!) to learn programming and electronics with. Python and pygame are part of this toolkit. Recently the RaspberryPi Foundation funded parts of the effort of porting of pypy to the Pi -- making Python programs on the Pi faster!

      Unfortunately pygame is written as a Python C extension that wraps SDL which means performance of pygame under pypy remains mediocre. To fix this pygame needs to be rewritten using cffi to wrap SDL instead.

      RaspberryPi sponsored a CTPUG (Cape Town Python User Group) hackathon to put together a proof-of-concept pygame-cffi. The day was quite successful - we got a basic version of the bub'n'bros client working on pygame-cffi (and on PyPy). The results can be found on github with contributions from the five people present at the sprint.

      While far from complete, the proof of concept does show that there are no major obstacles to porting pygame to cffi and that cffi is a great way to bind your Python package to C libraries.

      Amazingly, we managed to have machines running all three major platforms (OS X, Linux and Windows) at the hackathon so the code runs on all of them!

      We would like to thank the Praekelt foundation for providing the venue and The Raspberry Pi foundation for providing food and drinks!

      Simon Cross, Jeremy Thurgood, Neil Muller, David Sharpe and fijal.

      Author: "Maciej Fijalkowski (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Date: Saturday, 30 Nov 2013 09:57

      The next PyPy sprint will be in Leysin, Switzerland, for the ninth time. This is a fully public sprint: newcomers and topics other than those proposed below are welcome.

      Goals and topics of the sprint

      • Py3k: work towards supporting Python 3 in PyPy
      • NumPyPy: work towards supporting the numpy module in PyPy
      • STM: work towards supporting Software Transactional Memory
      • And as usual, the main side goal is to have fun in winter sports :-) We can take a day off for ski.

      Exact times

      For a change, and as an attempt to simplify things, I specified the dates as 11-19 January 2014, where 11 and 19 are travel days. We will work full days between the 12 and the 18. You are of course allowed to show up for a part of that time only, too.

      Location & Accomodation

      Leysin, Switzerland, "same place as before". Let me refresh your memory: both the sprint venue and the lodging will be in a very spacious pair of chalets built specifically for bed & breakfast: http://www.ermina.ch/. The place has a good ADSL Internet connexion with wireless installed. You can of course arrange your own lodging anywhere (as long as you are in Leysin, you cannot be more than a 15 minutes walk away from the sprint venue), but I definitely recommend lodging there too -- you won't find a better view anywhere else (though you probably won't get much worse ones easily, either :-)

      Please confirm that you are coming so that we can adjust the reservations as appropriate. The rate so far has been around 60 CHF a night all included in 2-person rooms, with breakfast. There are larger rooms too (less expensive per person) and maybe the possibility to get a single room if you really want to.

      Please register by Mercurial:


      or on the pypy-dev mailing list if you do not yet have check-in rights:


      You need a Swiss-to-(insert country here) power adapter. There will be some Swiss-to-EU adapters around -- bring a EU-format power strip if you have one.

      Author: "Armin Rigo (noreply@blogger.com)"
      Send by mail Print  Save  Delicious 
      Next page
      » You can also retrieve older items : Read
      » © All content and copyrights belong to their respective authors.«
      » © FeedShow - Online RSS Feeds Reader