Foreach 循环中的并行处理 [英] Parallel processing in Foreach loop

查看:59
本文介绍了Foreach 循环中的并行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我遇到了一种情况,我正在调用一些 API 来获取电影列表.对于列表中的每条记录,我调用另一个 API.我想让 for 循环并行以获得更好的性能.以下是我的示例代码.

Hello I have a situation where I am calling some API to get a list of movies. For each record in the list, I call another API. I would like to make that for loop parallel for better performance. The following is my sample code.

movies = []

for movie in collection:
    tmdb_movie = tmdb.get_movie(movie['detail']['ids']['tmdb_id'])
    movies.append(tmdb_movie)

return tmdb_movie

所以我使用多处理的解决方案如下:

So my solution using multiprocessing is as follows:

pool = Pool()
output = pool.map(tmdb.get_movie, [movie['detail']['ids']['tmdb_id'] for movie in collection])

但是当我执行此代码时,出现以下错误

But when I execute this code, I get following error

PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

如果有人能帮助我在 python 2.7 中并行执行此功能,我将不胜感激.

I would really appreciate if someone can help me making this functionality parallel in python 2.7.

推荐答案

最好的选择是使用线程.Python 中的线程不能并行使用 CPU,但它们可以在有阻塞操作的情况下并发执行.进程虽然可以真正并行运行,但启动和通信速度较慢,并且更适合受 CPU 限制的大型工作负载.此外,正如您在问题中指出的,流程有时可能难以启动.

The best option for this would be to use threads. Threads in Python cannot use CPUs in parallel, but they can execute concurrently while there are blocking operations. Processes, although the can really run in parallel, are slow to start and communicate with, and are better suited to big CPU-bounded work loads. Also, as you indicate in your question, processes can sometimes be difficult to launch.

您可以使用有点秘密(即未记录但实际上众所周知)的 multiprocessing.pool.ThreadPool 类.如果您打算多次这样做,您可以在开始时创建一个池并重复使用它.您只需要确保在程序退出时调用 pool.close()pool.join() .

You can use the somewhat-secret (i.e. undocumented but actually well known) multiprocessing.pool.ThreadPool class. If you are going to be doing this many times, you can create a pool once at the beginning and reuse it. You just need to make sure pool.close() and maybe also pool.join() are called when the program exits.

from multiprocessing.pool import ThreadPool

# Global/class variables    
NUM_THREADS = 5
pool = ThreadPool(NUM_THREADS)

# Inside some function/method
return pool.map(lambda movie: tmdb.get_movie(movie['detail']['ids']['tmdb_id']), movies)

# On exit
pool.close()  # Prevents more jobs to be submitted
pool.join()  # Waits until all jobs are finished

这篇关于Foreach 循环中的并行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆