Python多重处理.记忆 [英] Python multiprocessing.Pool & memory

查看:76
本文介绍了Python多重处理.记忆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Pool.map进行评分:

  1. 具有来自数据源的数百万个数组的光标"
  2. 计算
  3. 将结果保存到数据接收器中

结果是独立的.

我只是想知道我是否可以避免内存需求.首先 似乎每个数组都进入python,然后2和3是 继续.无论如何,我的速度都有提高.

I'm just wondering if I can avoid the memory demand. At first it seems that every array goes into python and then the 2 and 3 are proceed. Anyway I have a speed improvement.

#data src and sink is in mongodb#
def scoring(some_arguments):
        ### some stuff  and finally persist  ###
    collection.update({uid:_uid},{'$set':res_profile},upsert=True)


cursor = tracking.find(timeout=False)
score_proc_pool = Pool(options.cores)    
#finaly I use a wrapper so I have only the document as input for map
score_proc_pool.map(scoring_wrapper,cursor,chunksize=10000)

我是在做错什么还是为此目的使用python更好的方法?

Am I doing something wrong or is there a better way with python for this purpose?

推荐答案

Poolmap函数在内部将Iterable转换为列表,如果它没有__len__属性.相关代码在 Pool.map_async 中,因为Pool.map(和starmap)使用它来产生结果-这也是一个列表.

The map functions of a Pool internally convert the iterable to a list if it doesn't have a __len__ attribute. The relevant code is in Pool.map_async, as that is used by Pool.map (and starmap) to produce the result - which is also a list.

如果您不想首先将所有数据读入内存,则应使用 Pool.imap Pool.imap_unordered,它们将生成一个迭代器,该迭代器将在输入结果时产生结果.

If you don't want to read all data into memory first, you should use Pool.imap or Pool.imap_unordered, which will produce an iterator that will yield the results as they come in.

这篇关于Python多重处理.记忆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆