Python多重处理.记忆 [英] Python multiprocessing.Pool & memory
问题描述
我正在使用Pool.map
进行评分:
- 具有来自数据源的数百万个数组的光标"
- 计算
- 将结果保存到数据接收器中
结果是独立的.
我只是想知道我是否可以避免内存需求.首先 似乎每个数组都进入python,然后2和3是 继续.无论如何,我的速度都有提高.
I'm just wondering if I can avoid the memory demand. At first it seems that every array goes into python and then the 2 and 3 are proceed. Anyway I have a speed improvement.
#data src and sink is in mongodb#
def scoring(some_arguments):
### some stuff and finally persist ###
collection.update({uid:_uid},{'$set':res_profile},upsert=True)
cursor = tracking.find(timeout=False)
score_proc_pool = Pool(options.cores)
#finaly I use a wrapper so I have only the document as input for map
score_proc_pool.map(scoring_wrapper,cursor,chunksize=10000)
我是在做错什么还是为此目的使用python更好的方法?
Am I doing something wrong or is there a better way with python for this purpose?
推荐答案
Pool
的map
函数在内部将Iterable转换为列表,如果它没有__len__
属性.相关代码在 Pool.map_async
中,因为Pool.map
(和starmap
)使用它来产生结果-这也是一个列表.
The map
functions of a Pool
internally convert the iterable to a list if it doesn't have a __len__
attribute. The relevant code is in Pool.map_async
, as that is used by Pool.map
(and starmap
) to produce the result - which is also a list.
如果您不想首先将所有数据读入内存,则应使用 Pool.imap
或Pool.imap_unordered
,它们将生成一个迭代器,该迭代器将在输入结果时产生结果.
If you don't want to read all data into memory first, you should use Pool.imap
or Pool.imap_unordered
, which will produce an iterator that will yield the results as they come in.
这篇关于Python多重处理.记忆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!