如何使用初始化程序设置我的多进程池? [英] how to use initializer to set up my multiprocess pool?

查看:79
本文介绍了如何使用初始化程序设置我的多进程池?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用多进程Pool对象.我希望每个进程在启动时都打开一个数据库连接,然后使用该连接来处理传入的数据.(而不是为每个数据位打开和关闭连接.)为此,但我无法确定工作人员和初始化程序如何通信.所以我有这样的东西:

I'm trying to use the multiprocess Pool object. I'd like each process to open a database connection when it starts, then use that connection to process the data that is passed in. (Rather than opening and closing the connection for each bit of data.) This seems like what the initializer is for, but I can't wrap my head around how the worker and the initializer communicate. So I have something like this:

def get_cursor():
  return psycopg2.connect(...).cursor()

def process_data(data):
   # here I'd like to have the cursor so that I can do things with the data

if __name__ == "__main__":
  pool = Pool(initializer=get_cursor, initargs=())
  pool.map(process_data, get_some_data_iterator())

我(或我)如何将光标从get_cursor()返回到process_data()?

how do I (or do I) get the cursor back from get_cursor() into the process_data()?

推荐答案

因此调用了initialize函数:

The initialize function is called thus:

def worker(...):
    ...
    if initializer is not None:
        initializer(*args)

因此,任何地方都没有保存返回值.您可能会认为这注定了您的命运,但是没有!每个工人都在一个单独的过程中.因此,您可以使用普通的global变量.

so there is no return value saved anywhere. You might think this dooms you, but no! Each worker is in a separate process. Thus, you can use an ordinary global variable.

这不是很漂亮,但是可以工作:

This is not exactly pretty, but it works:

cursor = None
def set_global_cursor(...):
    global cursor
    cursor = ...

现在,您只需在process_data函数中使用cursor.每个单独进程内的cursor变量都与所有其他进程分开,因此它们不会互相作用.

Now you can just use cursor in your process_data function. The cursor variable inside each separate process is separate from all the other processes, so they do not step on each other.

(我不知道psycopg2是否有不同的方式来处理此问题,而该方法首先不涉及使用multiprocessing;这是对multiprocessing模块的一般问题的一般回答. )

(I have no idea whether psycopg2 has a different way to deal with this that does not involve using multiprocessing in the first place; this is meant as a general answer to a general problem with the multiprocessing module.)

这篇关于如何使用初始化程序设置我的多进程池?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆