Monday, August 14, 2017

Let's remove the Global Interpreter Lock

Hello everyone

The Python community has been discussing removing the Global Interpreter Lock for a long time. There have been various attempts at removing it: Jython or IronPython successfully removed it with the help of the underlying platform, and some have yet to bear fruit, like gilectomy. Since our February sprint in Leysin, we have experimented with the topic of GIL removal in the PyPy project. We believe that the work done in IronPython or Jython can be reproduced with only a bit more effort in PyPy. Compared to that, removing the GIL in CPython is a much harder topic, since it also requires tackling the problem of multi-threaded reference counting. See the section below for further details.

As we announced at EuroPython, what we have so far is a GIL-less PyPy which can run very simple multi-threaded, nicely parallelized, programs. At the moment, more complicated programs probably segfault. The remaining 90% (and another 90%) of work is with putting locks in strategic places so PyPy does not segfault during concurrent accesses to data structures.

Since such work would complicate the PyPy code base and our day-to-day work, we would like to judge the interest of the community and the commercial partners to make it happen (we are not looking for individual donations at this point). We estimate a total cost of $50k, out of which we already have backing for about 1/3 (with a possible 1/3 extra from the STM money, see below). This would give us a good shot at delivering a good proof-of-concept working PyPy with no GIL. If we can get a $100k contract, we will deliver a fully working PyPy interpreter with no GIL as a release, possibly separate from the default PyPy release.

People asked several questions, so I'll try to answer the technical parts here.

What would the plan entail?

We've already done the work on the Garbage Collector to allow doing multi- threaded programs in RPython. "All" that is left is adding locks on mutable data structures everywhere in the PyPy codebase. Since it would significantly complicate our workflow, we require real interest in that topic, backed up by commercial contracts in order to justify the added maintenance burden.

Why did the STM effort not work out?

STM was a research project that proved that the idea is possible. However, the amount of user effort that is required to make programs run in a parallelizable way is significant, and we never managed to develop tools that would help in doing so. At the moment we're not sure if more work spent on tooling would improve the situation or if the whole idea is really doomed. The approach also ended up adding significant overhead on single threaded programs, so in the end it is very easy to make your programs slower. (We have some money left in the donation pot for STM which we are not using; according to the rules, we could declare the STM attempt failed and channel that money towards the present GIL removal proposal.)

Wouldn't subinterpreters be a better idea?

Python is a very mutable language - there are tons of mutable state and basic objects (classes, functions,...) that are compile-time in other language but runtime and fully mutable in Python. In the end, sharing things between subinterpreters would be restricted to basic immutable data structures, which defeats the point. Subinterpreters suffers from the same problems as multiprocessing with no additional benefits. We believe that reducing mutability to implement subinterpreters is not viable without seriously impacting the semantics of the language (a conclusion which applies to many other approaches too).

Why is it easier to do in PyPy than CPython?

Removing the GIL in CPython has two problems:

  • how do we guard access to mutable data structures with locks and
  • what to do with reference counting that needs to be guarded.

PyPy only has the former problem; the latter doesn't exist, due to a different garbage collector approach. Of course the first problem is a mess too, but at least we are already half-way there. Compared to Jython or IronPython, PyPy lacks some data structures that are provided by JVM or .NET, which we would need to implement, hence the problem is a little harder than on an existing multithreaded platform. However, there is good research and we know how that problem can be solved.

Best regards,
Maciej Fijalkowski


14 comments:

Patrick said...

Where can one donate? Is there a specific page for it? :)

Anonymous said...

Where can we we donate or forward a link to managing directors for corporate donations?

funny_falcon said...

Neither .Net, nor Java put locks around every mutable access. Why the hell PyPy should?

scott taggart said...

It sounds to me like you are just looking for money to spend. I see no reliable or commercial deliverable coming out of this effort (you listed a bucketload of caveats already). If it were doable in $100k, it would have been done long ago, no? Caveat Emptor to those who toss their money at this.

Ahmet Alp Balkan said...

200+ comments about this article are at: https://news.ycombinator.com/item?id=15008636

Zunzster said...

@funny_falcon: I don't read this as them arguing for putting "putting locks around *every* mutable access". Rather, just the core shared-mutable pieces of the run-time library and infrastructure, which in .NET and the JVM are provided by the VM itself for Jython and IronPython but which PyPy has to implement.

@scott_taggart: Your vision seems limited. Perhaps you aren't familiar with the PyPy team's strong history of delivering. It may well be 'doable in $100K' but how is that supposed to have spontaneously happened already without a viable plan and a trusted team which is exactly what the PyPy project is?

I always thought the STM concept was really clever and elegant in theory but that the overhead involved, both in recording and rollback-retries, could impact forward progress too much to be viable in practice. Essentially, STM and locks are dual's of each other, with STM having better composition and locks less overhead.

At least with a more traditional locking approach, the locks are still being inserted by the interpreter/library, so they can be reasoned about more carefully (and even instrumented programmatically) to avoid some of the classic problems with lock-based designs whilst regaining the performance lost to STM overhead.

If anyone can pull it off, the PyPy team can :-)

Tibor Vavra said...

+1

Rudolf Liebenberg said...

Why not rather implement immutable datastructures like Clojure does?

Anonymous said...

Oh, just shut up and take my money.

Anonymous said...

I have been very impressed with the PyPy developers accomplishments to date and sincerely hope that they find corporate sponsors for this worthwhile endeavor.

Simon Hughes said...

How can people donate? $50k seems a bargain for such an important achievement. That's pocket change to most moderately sized companies.

Joce said...

Sounds good, perhaps time to mark the STM effort as stale?

Maharshi Mukherjee said...

This would be awesome, please. :(

PvdE said...

I donated to the original STM and would be happy if it were reallocated to this.