Django条件子查询聚合 [英] Django conditional Subquery aggregate
问题描述
我的模型结构的简化示例是
An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
现在,我想显示一个表,该表显示公司,其中一列将包含某种类型的部门数,例如 type = 10
.目前,这是通过 Corporation
模型上的帮助程序实现的,该帮助程序可以检索这些信息,例如
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10
. Currently, this is implemented with a helper on the Corporation
model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
这里的问题是,由于N + 1问题,这绝对会破坏性能.
The problem here is that this absolutely murders performance due to the N+1 problem.
我尝试使用 select_related
, prefetch_related
, annotate
和 subquery
解决此问题,但是我无法获得我需要的结果.
I have tried to approach this problem with select_related
, prefetch_related
, annotate
, and subquery
, but I havn't been able to get the results I need.
理想情况下,查询集中的每个 Corporation
都应使用整数 type_10_count
进行注释,该整数应反映该类型部门的数量.
Ideally, each Corporation
in the queryset should be annotated with an integer type_10_count
which reflects the number of departments of that type.
我确定我可以在 .extra()
中使用原始sql进行操作,但是文档宣布它将不推荐使用(我在Django 1.11上)
I'm sure I could do something with raw sql in .extra()
, but the docs announce that it is going to be deprecated (I'm on Django 1.11)
原始sql解决方案示例
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
推荐答案
我认为,通过 Subquery
,我们可以使用此代码获得与您提供的SQL类似的SQL
I think with Subquery
we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
生成的SQL是
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
这里有些担心的是,大表子查询的速度可能很慢.但是,数据库查询优化器足够聪明,可以将子查询提升为OUTER JOIN,至少我听说PostgreSQL做到了.
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
这篇关于Django条件子查询聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!