Concurrency Part 2

Bryan Lott published on
2 min, 208 words

Wow, even trivial concurrency in Python is hard...

So, you have a list of tasks that you want to consume lazily (as they're generated, so you don't blow up memory). Great! That's easy to do with a generator function using the yield keyword or the (syntax). But, oh no... you can't pass that to a multiprocess function because it's not picklable. Drat!

Next, you go down the road of:

p.map(worker, itertools.islice(generator_function, slice_size))

Which is great, until you want to lazily push results back to the main process because that work now blows up memory.

for block in p.imap(worker, itertools.islice(generator_function, slice_size)):
    do_additional_work()

The above won't work because again, generator functions aren't picklable so you have to return result sets from the worker function instead of being able to yield rows back.

So, you end up shoving as much as you can into the worker function so that you don't have to deal with passing large chunks of work back and forth. I'm sorry, but this is a terrible state of things. I'm sure there's libraries that help to ease this pain but my point is there shouldn't NEED to be libraries. This should be core python. And yes, sadly, I'm stuck on 2.7x.