芹菜工人变量共享问题 [英] Celery worker variable sharing issues

查看:74
本文介绍了芹菜工人变量共享问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在项目中使用Python和芹菜.在项目中,我有两个文件:

I am using Python and celery in a project. In the project, I have two files:

celeryconfig.py

BROKER_URL = "amqp://guest:guest@localhost:5672//"
CELERY_RESULT_BACKEND = "amqp"
CELERY_IMPORTS = ("example",)
CELERYD_CONCURRENCY = 2

example.py

from celery.task import task
import hashlib

md5 = hashlib.md5()

@task
def getDigest(text):
    print 'Using md5 - ',md5
    md5.update(text)
    return md5.digest()

celeryconfig.py 中,我将 CELERYD_CONCURRENCY 设置为 2 ,这意味着它将把我的任务队列中的任务分配给 2 不同的过程.

In celeryconfig.py, I set the CELERYD_CONCURRENCY to 2, which means that it will distribute the tasks in my task queue to 2 different processes.

从Python控制台运行:

From a Python console, I run:

from example import getDigest
getDigest.delay('foo');getDigest.delay('bar')

这将创建两个由两个工作人员同时执行的任务. 问题在于,当两个工作进程都运行其任务函数[ getDigest()]时,它们似乎正在使用相同的哈希对象( md5 ). celeryd 的输出确认了这一点,如下所示.

This creates two tasks that are simultaneously executed by the two workers. The problem is, as both of the worker processes run their task functions [getDigest()], they seem to be using the same hash object (md5). The output of celeryd confirms this as you can see below.

[PoolWorker-2] Using md5 -
[PoolWorker-2] <md5 HASH object @ 0x23e6870>
[PoolWorker-1] Using md5 -
[PoolWorker-1] <md5 HASH object @ 0x23e6870>

为简单起见,我使用的是hashlib的md5对象,但在我的实际项目中,我使用的对象不能通过多个进程访问和修改.预期这会使工人崩溃.

For the sake of simplicity, I am using the md5 object of hashlib, but in my actual project, I am using an object that cannot be accessed and modified by more than one process. This expectedly makes the workers crash.

这带来了一个问题:如何修改代码以使工作进程初始化并使用它们自己的( md5 )对象?现在,他们正在共享同一个对象-导致我的应用程序崩溃.这可能吗?

That brings up the question: How can I modify my code to make the worker processes initialize and use their very own (md5) object? Right now, they are sharing the same object - causing my application to crash. Is this possible?

推荐答案

它们使用相同的对象,因为您在代码中明确地告诉它们.通过在任务范围之外创建对象并在任务中使用它,可以使所有工作人员都可以访问共享对象.这是一个并发问题,不一定是芹菜问题.如果对象很小,则可以使用它的副本,也可以使用自己的锁定策略.但是,通常来说,如果一个对象一次要由多个进程进行更新,则它需要采用某种同步,这超出了Celery的范围.

They're using the same object because you're explicitly telling them to in your code. By creating the object outside the scope of the task and using it within the task, you are giving all workers access to the shared object. This is a concurrency issue, not necessarily a Celery issue. You could use a copy of the object if it's small, or use your own locking strategy. In general, though, if an object is going to be updated by more than one process at a time, it needs to employ some sort of synchronization, which is outside of the scope of Celery.

这篇关于芹菜工人变量共享问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆