django:预取相关的GenericForeignKey对象 [英] django: prefetch related objects of a GenericForeignKey

查看:167
本文介绍了django:预取相关的GenericForeignKey对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个模型 Box ,一个 GenericForeignKey 指向一个 Apple 实例或巧克力实例。 Apple 巧克力反过来,将ForeignKeys设置为 Farm 工厂。我想显示一个 Box es的列表,我需要访问 Farm 工厂。如何在尽可能少的数据库查询中执行此操作?



最小化示例:

  class Farm(Model):
...

class Apple(Model):
farm = ForeignKey(Farm)
...

class工厂(型号):
...

类巧克力(型号):
factory = ForeignKey(Factory)


class Box(Model)
content_type = ForeignKey(ContentType)
object_id = PositiveIntegerField()
content_object = GenericForeignKey('content_type','object_id')
...

def __unicode __(self):
如果self.content_type == ContentType.objects.get_for_model(Apple):
apple = self.content_object
返回Apple {} from Farm {}格式(apple,apple.farm)
elif self.content_type == ContentType.objects.get_for_model(巧克力):
chocolate = self.content_object
返回巧克力{}从Facto ry {}。format(chocolate,chocolate.factory)

以下是我尝试过的几件事情。在所有这些例子中, N 是Box的数量。查询计数假定 Apple 巧克力 ContentType >已经被缓存,所以 get_for_model()调用不要打DB。



1)朴素:



打印[Box.objects.all()]中的方框。



这样 1 (提取框)+ N (为每个框提取Apple或Chocolate)+ N (fetch每个巧克力的每个苹果和工厂的农场)查询。



2) select_related 在这里没有帮助,因为 Box.content_object 是一个 GenericForeignKey



3)从django 1.4开始, prefetch_related 可以获取 GenericForeignKey s。



打印在Box.objects.prefetch_related('content_object')中的框的框。all()]



1 (fetch Boxes)+ 2 (为所有Boxe提取苹果和巧克力s)+ N (为每个苹果和每个巧克力的工厂获取农场)查询。



4)显然 prefetch_related 不够聪明,不能跟随GenericForeignKeys的ForeignKeys。如果我尝试:



打印[Box.objects.prefetch_related(
'content_object__farm',
' content_object__factory')。all()]



它正确地抱怨巧克力没有一个农场字段,反之亦然。



5)我可以做:

  apple_ctype = ContentType.objects.get_for_model(Apple)
chocolate_ctype = ContentType.objects.get_for_model(巧克力)
boxes_with_apples = Box .objects.filter(content_type = apple_ctype).prefetch_related('content_object__farm')
boxes_with_chocolates = Box.objects.filter(content_type = chocolate_ctype).prefetch_related('content_object__factory')

这样做 1 (fetch Boxes)+ 2 (获取所有Box的苹果和巧克力) + 2 (为所有巧克力的所有苹果和工厂提取农场)查询。缺点是我必须手动合并和排序两个查询器( boxes_with_apples boxes_with_chocolates )。在我的实际应用中,我在一个分页的ModelAdmin中显示这些Box。在这里集成这个解决方案并不明显。也许我可以写一个自定义分页符来透明地执行缓存?



6)我可以拼凑一些基于这个也是O(1)查询。但是,如果我可以避免,我宁愿不要混淆内部( _content_object_cache )。



总结: strong>打印框需要访问GenericForeignKey的ForeignKeys。如何在O(1)查询中打印N盒?是(5)能做到最好还是有更简单的解决方案?



奖金点:你如何重构这个数据库模式以使这样的查询更容易?

解决方案

你可以手动实现类似 prefetch_selected ,并使用Django的 select_related 方法,这将使数据库查询成为可能。

  apple_ctype = ContentType.objects.get_for_model(Apple)
chocolate_ctype = ContentType.objects.get_for_model(巧克力)
boxes = Box。 object.all()
content_objects = {}
#apples
content_objects [apple_ctype.id] = Apple.objects.select_related(
'farm')。in_bulk(
[b.object_id for b in box if b.content_type == apple_ctype]

#巧克力
content_objects [chocolate_ctype.id] = Chocolate.objects.select_related(
'工厂')。in_bulk(
[b.object_如果b.content_type == chocolate_ctype] b
$ b

只省略3个查询( get_for_model 查询)。 in_bulk 方法返回格式为{id:model}的dict。所以要获取你的content_object,你需要一个代码如下:

  content_obj = content_objects [box.content_type_id] [box.object_id] 

然而,我不知道这段代码会更快,那么你的 O(5) strong>解决方案,因为它需要在盒子查询器上进行额外的迭代,并且还可以生成 WHERE id IN(...)语句



但是,如果您只能通过Box模型的字段对框进行排序,则可以在分页后填写 content_objects dict。但是您需要通过 content_objects __ unicode __ 某种方式


如何重构这个DB模式,使这样的查询更容易?


我们有类似的结构。我们在中存储 content_object ,而不是 object_id 和我们在 Apple 中使用 ForeignKey(Box) code>巧克力。在中,我们有一个 get_object 方法来返回Apple或Chocolate模型。在这种情况下,我们可以使用 select_related ,但在大多数用例中,我们按content_type过滤Boxes。所以我们有同样的问题,如你的第五个选项。但是当没有prefetch_selected时,我们开始在Django 1.2上开发项目。



如果您将farm / factory重命名为一些常见的名称,如创建者,将prefetch_related工作?



关于您的选项6



我可以反对填写 _content_object_cache
如果您不喜欢处理内部组件,您可以填写自定义属性,然后使用

  apple = getattr self,'my_custop_prop',无)
如果apple是None:
apple = self.content_object


Suppose I have a model Box with a GenericForeignKey that points to either an Apple instance or a Chocolate instance. Apple and Chocolate, in turn, have ForeignKeys to Farm and Factory, respectively. I want to display a list of Boxes, for which I need to access Farm and Factory. How do I do this in as few DB queries as possible?

Minimal illustrative example:

class Farm(Model):
    ...

class Apple(Model):
    farm = ForeignKey(Farm)
    ...

class Factory(Model):
    ...

class Chocolate(Model):
    factory = ForeignKey(Factory)
    ...

class Box(Model)
    content_type = ForeignKey(ContentType)
    object_id = PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
    ...

    def __unicode__(self):
        if self.content_type == ContentType.objects.get_for_model(Apple):
            apple = self.content_object
            return "Apple {} from Farm {}".format(apple, apple.farm)
        elif self.content_type == ContentType.objects.get_for_model(Chocolate):
            chocolate = self.content_object
            return "Chocolate {} from Factory {}".format(chocolate, chocolate.factory)

Here are a few things I tried. In all these examples, N is the number of Boxes. The query count assumes that the ContentTypes for Apple and Chocolate have already been cached, so the get_for_model() calls do not hit the DB.

1) Naive:

print [box for box in Box.objects.all()]

This does 1 (fetch Boxes) + N (fetch Apple or Chocolate for each Box) + N (fetch Farm for each Apple and Factory for each Chocolate) queries.

2) select_related doesn't help here, because Box.content_object is a GenericForeignKey.

3) As of django 1.4, prefetch_related can fetch GenericForeignKeys.

print [box for box in Box.objects.prefetch_related('content_object').all()]

This does 1 (fetch Boxes) + 2 (fetch Apples and Chocolates for all Boxes) + N (fetch Farm for each Apple and Factory for each Chocolate) queries.

4) Apparently prefetch_related isn't smart enough to follow ForeignKeys of GenericForeignKeys. If I try:

print [box for box in Box.objects.prefetch_related( 'content_object__farm', 'content_object__factory').all()]

it rightfully complains that Chocolate objects don't have a farm field, and vice versa.

5) I could do:

apple_ctype = ContentType.objects.get_for_model(Apple)
chocolate_ctype = ContentType.objects.get_for_model(Chocolate)
boxes_with_apples = Box.objects.filter(content_type=apple_ctype).prefetch_related('content_object__farm')
boxes_with_chocolates = Box.objects.filter(content_type=chocolate_ctype).prefetch_related('content_object__factory')

This does 1 (fetch Boxes) + 2 (fetch Apples and Chocolates for all Boxes) + 2 (fetch Farms for all Apples and Factories for all Chocolates) queries. The downside is that I have to merge and sort the two querysets (boxes_with_apples, boxes_with_chocolates) manually. In my real application, I'm displaying these Boxes in a paginated ModelAdmin. It's not obvious how to integrate this solution there. Maybe I could write a custom Paginator to do this caching transparently?

6) I could cobble together something based on this that also does O(1) queries. But I'd rather not mess with internals (_content_object_cache) if I can avoid it.

In summary: Printing a Box requires access to the ForeignKeys of a GenericForeignKey. How can I print N Boxes in O(1) queries? Is (5) the best I can do, or is there a simpler solution?

Bonus points: How would you refactor this DB schema to make such queries easier?

解决方案

You can manually implement something like prefetch_selected and use Django's select_related method, that will make join in database query.

apple_ctype = ContentType.objects.get_for_model(Apple)
chocolate_ctype = ContentType.objects.get_for_model(Chocolate)
boxes = Box.objects.all()
content_objects = {}
# apples
content_objects[apple_ctype.id] = Apple.objects.select_related(
    'farm').in_bulk(
        [b.object_id for b in boxes if b.content_type == apple_ctype]
    )
# chocolates
content_objects[chocolate_ctype.id] = Chocolate.objects.select_related(
    'factory').in_bulk(
        [b.object_id for b in boxes if b.content_type == chocolate_ctype]
    )

This should make only 3 queries (get_for_model queries are omitted). The in_bulk method returns you a dict in format {id: model}. So to get your content_object you need a code like:

content_obj = content_objects[box.content_type_id][box.object_id]

However I'm not sure if this code will be quicker then your O(5) solution as it requires additional iteration over boxes queryset and also it generates query with WHERE id IN (...) statement

But if you sort boxes only by fields from Box model you can fill the content_objects dict after pagination. But you need to pass content_objects to __unicode__ somehow

How would you refactor this DB schema to make such queries easier?

We have similar structure. We store content_object in Box, but instead of object_id and content_object we use ForeignKey(Box) in Apple and Chocolate. In Box we have a get_object method to return Apple or Chocolate model. In this case we can use select_related, but in most of our use-cases we filter Boxes by content_type. So we have the same problems like your 5th option. But we started project on Django 1.2 when there were no prefetch_selected.

If you rename farm/factory to some common name, like creator, will prefetch_related work?

About your option 6

I can say anything against filling _content_object_cache. If you don't like to deal with internals you can fill custom property and then use

apple = getattr(self, 'my_custop_prop', None)
if apple is None:
    apple = self.content_object

这篇关于django:预取相关的GenericForeignKey对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆