多处理模块和独特的psycopg2连接 [英] multiprocessing module and distinct psycopg2 connections

查看:113
本文介绍了多处理模块和独特的psycopg2连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对某些使用 psycopg2 进行并行查询的多处理代码的行为感到困惑。

I am very puzzled as to the behavior of some multiprocessing code that is using psycopg2 to make queries in parallel to a postgres db.

基本上,我正在对较大表的各个分区进行相同的查询(具有不同的参数)。我正在使用multiprocessing.Pool分叉一个单独的查询。

Essentially, I am making the same query (with different params) to various partitions of a larger table. I am using multiprocessing.Pool to fork off a separate query.

我的多处理调用如下:

pool = Pool(processes=num_procs)
results=pool.map(run_sql, params_list)

我的run_sql代码如下:

My run_sql code looks like this:

def run_sql(zip2):
    conn = get_connection()
    curs = conn.cursor()
    print "conn: %s curs:%s pid=%s" % (id(conn), id(curs), os.getpid())
    ...
    curs.execute(qry)
    records = curs.fetchall()

def get_connection()
    ...
    conn = psycopg2.connect(user=db_user, host=db_host, 
                         dbname=db_name, password=db_pwd)

    return conn

因此,我的期望是每个进程将通过调用 get_connection(),并且 print id(conn)将显示不同的值。但是,情况似乎并非如此,我不知所措。即使打印ID(当前)也是一样。只有 print os.getpid()显示出差异。

So my expectation is that each process would get a separate db connection via the call to get_connection() and that print id(conn) would display a distinct value. However, that doesn't seem to be the case and I am at a loss to explain it. Even print id(curs) is the same. Only print os.getpid() shows a difference. Does it somehow use the same connection for each forked process ?

conn: 4614554592 curs:4605160432 pid=46802
conn: 4614554592 curs:4605160432 pid=46808
conn: 4614554592 curs:4605160432 pid=46810
conn: 4614554592 curs:4605160432 pid=46784
conn: 4614554592 curs:4605160432 pid=46811


推荐答案

我想我已经知道了。答案在于以下事实:Python中的多处理是不共享的,因此整个内存空间,功能和所有内容都会被复制。因此,对于每个进程,即使pid不同,存储空间也是彼此的副本,并且存储空间内的连接地址最终是相同的。同样的原因是为什么像我一开始那样声明全局连接池没有用,每个进程最终都有自己的连接池,一次只有1个活动连接。

I think I've figured this out. The answer lies in the fact that multiprocessing in Python is shared-nothing so the entire memory space is copied, functions and all. Hence for each process, even though the pid is different, the memory spaces are copies of each other and the address of the connection within the memory space ends up being the same. The same reason is why declaring a global connection pool as I did initially was useless, each process ended up with its own connection pool with just 1 connection active at a time.

这篇关于多处理模块和独特的psycopg2连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆