multiprocessing.pool.map和具有两个参数的函数 [英] multiprocessing.pool.map and function with two arguments

查看:3270
本文介绍了multiprocessing.pool.map和具有两个参数的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 multiprocessing.Pool()



这里是我想要的池:

  def insert_and_process(file_to_process,db):
db = DAL(path_to_mysql+ db)
#Table Definations
db.table.insert(** parse_file(file_to_process))
return True

如果__name __ ==__ main__:
file_list = os.listdir 。)
P = Pool(processes = 4)
P.map(insert_and_process,file_list,db)#这里有问题。

我想传递2个参数
我想做的只是初始化4个DB连接(这里将尝试在每个函数调用上创建连接,因此可能会有数百万个连接,并导致IO Freezed死亡)。如果我可以创建4个数据库连接,并为每个进程1,它会确定。



Pool有解决方案吗?

编辑:



在这两个人的帮助下,我这样做了:

  args = zip(f,cycle(dbs))
Out [ - ]:
[ ('f1','db1'),
('f2','db2'),
('f3','db3'),
('f4','db4' ),
('f5','db1'),
('f6','db2'),
('f7','db3'),
f8','db4'),
('f9','db1'),
('f10','db2'),
('f11','db3'
('f12','db4')]

,我将数据库连接代码移动到主级别,并执行此操作:

  def process_and_insert(args):

#Table Definations
args [1] .table.insert(** parse_file(args [0]))
return True

如果__name __ ==__ main__ :
file_list = os.listdir(。)
P = Pool(processes = 4)

dbs = [DAL(path_to_mysql / database)for i in范围(0,3)]
args = zip(file_list,cycle(dbs))
P.map(insert_and_process,args)#这里有问题。

是的,我要测试一下,让你们知道。

解决方案

Pool 文档没有说明将多个参数传递到目标函数的方法 - 我试过只传递一个序列,但不会展开(每个参数的序列的一项)。



但是,你可以写你的目标函数预期第一个(和唯一)参数是一个元组,其中每个元素是您期望的参数之一:

  from itertools import repeat 

def insert_and_process((file_to_process,db)):
db = DAL(path_to_mysql+ db)
#Table定义
db.table .insert(** parse_file(file_to_process))
return True

如果__name __ ==__ main__:
file_list = os.listdir(。)
P = Pool(processes = 4)
P.map(insert_and_process,zip(file_list,repeat(db))

(注意 insert_and_process - python定义中的额外括号是作为单个参数,应该是一个2项序列。序列的第一个元素属于第一个变量,另一个属于第二个元素


I am using multiprocessing.Pool()

here is what i want to Pool:

def insert_and_process(file_to_process,db):
    db = DAL("path_to_mysql" + db)
    #Table Definations
    db.table.insert(**parse_file(file_to_process))
    return True

if __name__=="__main__":
    file_list=os.listdir(".")
    P = Pool(processes=4)
    P.map(insert_and_process,file_list,db) # here having problem.

I want to pass 2 arguments What i want to do is to initialize only 4 DB connections (here will try to create connection on every function call so possibly millions of them and cause IO Freezed to death) . if i can create 4 db connections and 1 for each processes it will be ok.

Is there any solution for Pool ? or should i abandon it ?

EDIT:

From help of both of you i got this by doing this:

args=zip(f,cycle(dbs))
Out[-]: 
[('f1', 'db1'),
 ('f2', 'db2'),
 ('f3', 'db3'),
 ('f4', 'db4'),
 ('f5', 'db1'),
 ('f6', 'db2'),
 ('f7', 'db3'),
 ('f8', 'db4'),
 ('f9', 'db1'),
 ('f10', 'db2'),
 ('f11', 'db3'),
 ('f12', 'db4')]

So here it how it gonna work , i gonna move DB connection code out to the main level and do this:

def process_and_insert(args):

    #Table Definations
    args[1].table.insert(**parse_file(args[0]))
    return True

if __name__=="__main__":
    file_list=os.listdir(".")
    P = Pool(processes=4)

    dbs = [DAL("path_to_mysql/database") for i in range(0,3)]
    args=zip(file_list,cycle(dbs))
    P.map(insert_and_process,args) # here having problem.

Yeah , i going to test it out and let you guys know.

解决方案

The Pool documentation does not say of a way of passing more than one parameter to the target function - I've tried just passing a sequence, but does not get unfolded (one item of the sequence for each parameter).

However, you can write your target function to expect the first (and only) parameter to be a tuple, in which each element is one of the parameters you are expecting:

from itertools import repeat

def insert_and_process((file_to_process,db)):
    db = DAL("path_to_mysql" + db)
    #Table Definations
    db.table.insert(**parse_file(file_to_process))
    return True

if __name__=="__main__":
    file_list=os.listdir(".")
    P = Pool(processes=4)
    P.map(insert_and_process,zip(file_list,repeat(db))) 

(note the extra parentheses in the definition of insert_and_process - python treat that as a single parameter that should be a 2-item sequence. The first element of the sequence is attributed to the first variable, and the other to the second)

这篇关于multiprocessing.pool.map和具有两个参数的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆