在 scoped_session 中跨越进程边界 [英] across process boundary in scoped_session

查看:52
本文介绍了在 scoped_session 中跨越进程边界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SQLAlchemy 和多处理.我也使用了 scoped_session,因为它避免了共享同一个会话,但我发现了一个错误和他们的解决方案,但我不明白为什么会发生这种情况.

I'm using SQLAlchemy and multiprocessing. I also use scoped_session sinse it avoids share the same session but I've found an error and their solution but I don't understand why does it happend.

你可以在下面看到我的代码:

You can see my code below:

db.py

engine = create_engine(connection_string)

Session = sessionmaker(bind=engine)
DBSession = scoped_session(Session)

script.py

from multiprocessing import Pool, current_process
from db import DBSession

def process_feed(test):
    session = DBSession()
    print(current_process().name, session)

def run():
    session = DBSession()
    pool = Pool()
    print(current_process().name, session)
    pool.map_async(process_feed, [1, 2]).get()

if __name__ == "__main__":
    run()

当我运行 script.py 输出是:

MainProcess <sqlalchemy.orm.session.Session object at 0xb707b14c>
ForkPoolWorker-1 <sqlalchemy.orm.session.Session object at 0xb707b14c>
ForkPoolWorker-2 <sqlalchemy.orm.session.Session object at 0xb707b14c>

注意会话对象在主进程及其工作进程(子进程)中是相同的0xb707b14c

Note that session object is the same 0xb707b14c in the main process and their workers (child process)

但是如果我改变前两行 run() 的顺序:

BUT If I change the order of first two lines run():

def run():
    pool = Pool() # <--- Now pool is instanced in the first line
    session = DBSession()  # <--- Now session is instanced in the second line
    print(current_process().name, session)
    pool.map_async(process_feed, [1, 2]).get()

我再次运行 script.py 输出是:

And the I run script.py again the output is:

MainProcess <sqlalchemy.orm.session.Session object at 0xb66907cc>
ForkPoolWorker-1 <sqlalchemy.orm.session.Session object at 0xb669046c>
ForkPoolWorker-2 <sqlalchemy.orm.session.Session object at 0xb66905ec>

现在会话实例不同了.

推荐答案

要了解为什么会发生这种情况,您需要了解 scoped_sessionPool 的实际作用.scoped_session 保留会话注册表,以便发生以下情况

To understand why this happens, you need to understand what scoped_session and Pool actually does. scoped_session keeps a registry of sessions so that the following happens

  • 第一次调用 DBSession 时,它会在注册表中为你创建一个 Session 对象
  • 随后,如果满足必要条件(即同一线程,会话尚未关闭),它不会创建新的 Session 对象,而是返回之前创建的 Session代码>对象返回
  • the first time you call DBSession, it creates a Session object for you in the registry
  • subsequently, if necessary conditions are met (i.e. same thread, session has not been closed), it does not create a new Session object and instead returns you the previously created Session object back

当您创建 Pool 时,它会在 __init__ 方法中创建工作线程.(请注意,在 __init__ 中启动工作进程没有什么基本的东西.同样有效的实现可以等到首先需要工作人员再启动它们,这在您的示例中会表现出不同的行为.)发生这种情况时(在 Unix 上),父进程 fork 自己为每个工作进程,这涉及操作系统将当前正在运行的进程的内存复制到一个新进程中,因此您将在字面上获得完全相同的对象完全相同的地方.

When you create a Pool, it creates the workers in the __init__ method. (Note that there's nothing fundamental about starting the worker processes in __init__. An equally valid implementation could wait until workers are first needed before it starts them, which would exhibit different behavior in your example.) When this happens (on Unix), the parent process forks itself for every worker process, which involves the operating system copying the memory of the current running process into a new process, so you will literally get the exact same objects in the exact same places.

将这两者放在一起,在第一个示例中,您在 fork 之前创建了一个 Session,它在创建 Pool 期间被复制到所有工作进程,结果在相同的身份中,而在第二个示例中,您将 Session 对象的创建延迟到工作进程启动之后,从而导致不同的身份.

Putting these two together, in the first example you are creating a Session before forking, which gets copied over to all worker processes during the creation of the Pool, resulting in the same identity, while in the second example you delay the creation of the Session object until after the worker processes have started, resulting in different identities.

需要注意的是,虽然 Session 对象共享相同的 id,但它们不是相同的对象,从某种意义上说,如果您更改父进程中 Session 的任何内容,它们将不会反映在子进程中.由于分叉,它们碰巧都共享相同的内存地址.然而,像连接这样的操作系统级资源是共享的,所以如果你在Pool()之前对session运行了一个查询code>,会在连接池中为您创建一个连接,然后分叉到子进程中.如果您随后尝试在子进程中执行查询,您将遇到奇怪的错误,因为您的进程正在通过相同的确切连接相互破坏!

It's important to note that while the Session objects share the same id, they are not the same object, in the sense that if you change anything about the Session in the parent process, they will not be reflected in the child processes. They just happen to all share the same memory address due to the fork. However, OS-level resources like connections are shared, so if you had run a query on session before Pool(), a connection would have been created for you in the connection pool and subsequently forked into the child processes. If you then attempt to perform queries in the child processes you will run into weird errors because your processes are clobbering over each other over the same exact connection!

以上对于 Windows 没有实际意义,因为 Windows 没有 fork().

The above is moot for Windows because Windows does not have fork().

这篇关于在 scoped_session 中跨越进程边界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆