为什么此Django(1.6)批注计数如此缓慢? [英] Why is this Django (1.6) annotate count so slow?

查看:43
本文介绍了为什么此Django(1.6)批注计数如此缓慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要:在计算相关对象时,我得到的查询非常慢,只使用很少的查询,并且每项的注释和附加查询要多两个.数据库是PostgreSQL 9.3.5.

Summary: I'm getting very slow queries using few queries and annotate vs. two queries extra per item when counting related objects. Database is PostgreSQL 9.3.5.

我有一个看起来像这样的模型:

I have a model that looks something like this:

class Collection(models.Model):
    have  = models.ManyToManyField(Item, related_name='item_have', through='Have')
    want  = models.ManyToManyField(Item, related_name='item_want', through='Want')
    added = models.DateTimeField()

    class Meta:
        ordering = ['-last_bump']

class Have(models.Model):
    item       = models.ForeignKey(Item)
    collection = models.ForeignKey(Collection, related_name='have_set')
    price      = models.IntegerField(default=0)

class Want(models.Model):
    want       = models.ForeignKey(Item)
    collection = models.ForeignKey(Collection, related_name='want_set')
    price      = models.IntegerField(default=0)

在我看来,我列出了这些收藏集,我想通过注释来显示每个收藏集中有多少个需求和拥有的数量:

And in my view, I list these Collections, and I want to show a count of how many wants and haves there are in each of them, doing that by doing an annotate:

class ListView(generic.ListView):
    model = Collection
    queryset = Collection.objects.select_related()
    paginate_by = 20

    def get_queryset(self):
        queryset = super(ListView, self).get_queryset()
        queryset = queryset.annotate(have_count=Count("have", distinct=True),
                                     want_count=Count("want", distinct=True))

但是,这使我的查询非常慢!我在数据库中有大约650条记录,并且django-debug-toolbar说它进行2次查询,平均大约400-500ms.我已经尝试过prefetch_related,但是并没有使它更快.

This, however, makes my query very slow! I have about 650 records in the DB and django-debug-toolbar says it makes 2 queries and averaging around 400-500ms. I've tried with prefetch_related, but it doesn't make it any quicker.

我确实尝试了另一件事,在Collection模型中,我添加了以下内容:

I did try another thing, in the Collection model, I added this:

@property
def have_count(self):
    return self.have.count()

@property
def want_count(self):
    return self.want.count()

,并从我的视图中删除了注释.取而代之的是,它对数据库总共进行了42次查询,但是它是在20到25毫秒内完成的.

and removed the annotate from my view. With this instead, it makes a total of 42 queries to the database, but it's done in 20-25ms.

我在这里的注释出了什么问题?在一个查询中进行计数是否比在进行多个计数查询时更快?

What am I doing wrong with my annotation here? Shouldn't it be faster to do the count in one query, vs doing many count queries?

推荐答案

为什么变慢:如果您只使用了两个多对多字段的注释,那么您将创建一个将所有这些表不必要的大联接连接在一起.必须求值的行的笛卡尔积的大小大约为 Have.objects.count()* Want.objects.count().然后,您编写了 distinct = True 来最终限制重复项的数量,以免获得无效的巨大结果.

Why it is slow: If you simply used the annotation by two ManyToMany fields then you create an unwanted big join of all these tables together. The size of the Cartesian product of rows that must be evaluated is approximately Have.objects.count() * Want.objects.count(). You wrote then distinct=True to restrict finally the number of duplicated items to not get an invalid huge result.

修复旧版Django:如果仅使用 queryset.annotate(have_count = Count("have")),则无需 distinct = True ,即可快速获得正确的结果>或相同的结果也可以快速区分.然后,您可以在内存中通过Python组合两个查询的结果.

Fix for old Django: If you would use only queryset.annotate(have_count=Count("have")) you will get the right result fast without distinct=True or the same result also fast with distinct. Then you can to combine results of two queries by Python in memory.

解决方案,在 Django> = 1.11 (问题两年后)中,可以通过使用带有两个查询来实现一个很好的解决方案>子查询,一个用于 Have ,一个用于 Want ,都是通过一个请求完成的,但不要将所有表混合在一起.

Solution A nice solution is possible in Django >= 1.11 (two years after your question) by use a query with two subqueries, one for Have and one for Want, all by one request, but not to mix all tables together.

from django.db.models import Count, OuterRef, Subquery

sq = Collection.objects.filter(pk=OuterRef('pk')).order_by()
have_count_subq = sq.values('have').annotate(have_count=Count('have')).values('have_count')
want_count_subq = sq.values('want').annotate(have_count=Count('want')).values('want_count')
queryset = queryset.annotate(have_count=Subquery(have_count_subq),
                             want_count=Subquery(want_count_subq))


验证:您可以通过打印如上所述的 str(my_queryset.query)来检查慢速查询和固定SQL查询.


Verify: You can check both the slow and the fixed SQL query by printing str(my_queryset.query) that it is as described above.

这篇关于为什么此Django(1.6)批注计数如此缓慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆