应该将django模型对象实例传递给芹菜? [英] Should django model object instances be passed to celery?

查看:118
本文介绍了应该将django模型对象实例传递给芹菜?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

$ _ code>#models.py
from django.db import models

class Person(models.Model):
first_name = models。 CharField(max_length = 30)
last_name = models.CharField(max_length = 30)
text_blob = models.CharField(max_length = 50000)

#tasks.py
进口芹菜
@ celery.task
def my_task(person):
#示例操作:做某事对人
#只需要几个人的属性
#而不是整个庞大的记录
person.first_name = person.first_name.title()
person.last_name = person.last_name.title()
person.save()

在我的应用程序中,我有一些类似的东西:



<$ p $从模型导入的人员
从任务导入的$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ()])
g.apply_async()




  • 芹菜泡菜p送给工人?

  • 如果工作人员在多台机器上运行,整个人员是否会通过网络传输整个人物(以及主要不需要的庞大的text_blob)?有没有办法避免这种情况?

  • 如何有效和均匀地将Person记录分发给在多台机器上运行的人员?


  • 这可以是一个更好的主意吗?如果Person有几百万条记录,它不会压倒db?

     #tasks.py 

    进口芹菜
    从模型导入Person
    @ celery.task
    def my_task(person_pk):
    #不需要text_blob的示例操作
    person = Person.objects .get(pk = person_pk)
    person.first_name = person.first_name.title()
    person.last_name = person.last_name.title()
    person.save()


    #在我的应用程序某处
    从模型导入Person
    从任务导入my_task
    import celery
    g = celery.group([my_task.s(p .pk)for p in Person.objects.all()])
    g.apply_async()



解决方案

我相信通过PK而不是整个模型对象更好,更安全。由于PK只是一个数字,序列化也简单得多。最重要的是,您可以使用更安全的制粒机(json / yaml而不是泡菜),并放心,您不会在序列化模型时遇到任何问题。



这篇文章说:


由于Celery是一个分布式系统,因此您不知道在哪个进程中,甚至在什么机器上运行任务。所以你不应该将Django模型对象作为参数传递给任务,它几乎总是更好地从数据库重新获取对象,因为可能存在竞争条件。



# models.py
from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    text_blob = models.CharField(max_length=50000)

# tasks.py
import celery
@celery.task
def my_task(person):
    # example operation: does something to person 
    # needs only a few of the attributes of person
    # and not the entire bulky record
    person.first_name = person.first_name.title()
    person.last_name = person.last_name.title()
    person.save()

In my application somewhere I have something like:

from models import Person
from tasks import my_task
import celery
g = celery.group([my_task.s(p) for p in Person.objects.all()])
g.apply_async()

  • Celery pickles p to send it to the worker right?
  • If the workers are running on multiple machines, would the entire person object (along with the bulky text_blob which is primarily not required) be transmitted over the network? Is there a way to avoid it?
  • How can I efficiently and evenly distribute the Person records to workers running on multiple machines?

  • Could this be a better idea? Wouldn't it overwhelm the db if Person has a few million records?

    # tasks.py
    
    import celery
    from models import Person
    @celery.task
    def my_task(person_pk):
        # example operation that does not need text_blob
        person = Person.objects.get(pk=person_pk)
        person.first_name = person.first_name.title()
        person.last_name = person.last_name.title()
        person.save()
    
    
    #In my application somewhere
    from models import Person
    from tasks import my_task
    import celery
    g = celery.group([my_task.s(p.pk) for p in Person.objects.all()])
    g.apply_async()
    

解决方案

I believe it is better and safer to pass PK rather than the whole model object. Since PK is just a number, serialization is also much simpler. Most importantly, you can use a safer sarializer (json/yaml instead of pickle) and have a peace of mind that you won't have any problems with serializing your model.

As this article says:

Since Celery is a distributed system, you can't know in which process, or even on what machine the task will run. So you shouldn't pass Django model objects as arguments to tasks, its almost always better to re-fetch the object from the database instead, as there are possible race conditions involved.

这篇关于应该将django模型对象实例传递给芹菜?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆