计算在Django ORM中按查询分组的带注释字段的总和最大值? [英] Calculate Max of Sum of an annotated field over a grouped by query in Django ORM?
问题描述
为简单起见,我有四个表(A,B,类别和关系),关系表将A的Intensity
存储在B中,类别存储B的类型.
A< ---关系---> B --->类别
(因此,当B和Category之间的关系是n到1时,A和B之间的关系是n到n)
我需要一个ORM来按类别和A对关系记录进行分组,然后在每个(类别,A)中计算Intensity
的Sum
(似乎很简单,直到此处),然后我要对计算出的Sum
的最大值进行注释在每个类别中.
我的代码如下:
A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))
哪个会引发错误:
django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate
Django-group-by 程序包,具有相同的错误.>
有关更多信息,请参见此stackoverflow问题.
我正在使用Django 2和PostgreSQL.
是否有一种使用ORM来实现此目标的方法,如果没有,使用原始SQL表达式的解决方案是什么?
更新
经过大量的努力,我发现我写的确实是一个聚合,但是我想要的是找出每个类别中每个A的AcSum的最大值.因此,我想我必须在AcSum计算之后再次对结果进行分组.基于这一见解,我发现了一个 stack-overflow问题提出相同的概念(问题是在1年零2个月前提出的,没有任何可接受的答案). 将另一个值('id')链接到集合既不能用作group_by,也不能用作输出属性的过滤器,它会从集合中删除AcSum.由于按结果集分组的更改,因此也无法将AcSum添加到values(). 我认为我想做的是基于列内的字段(即id)对按查询分组的分组进行重新分组. 有什么想法吗?
您无法进行汇总Max(Sum())
的汇总,无论您是否使用ORM,它在SQL中都是无效的.相反,您必须将表自身联接起来才能找到最大值.您可以使用子查询来执行此操作.下面的代码对我来说似乎很正确,但是请记住,我没有什么可以运行的,所以它可能并不完美.
from django.db.models import Subquery, OuterRef
annotation = {
'AcSum': Sum('intensity')
}
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)
# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1]
query = query.annotate(max_intensity=Subquery(subquery))
这应该生成类似以下的SQL
SELECT a_id, category_id,
(SELECT SUM(U0.intensity) AS AcSum
FROM RELATION U0
JOIN B U1 on U0.b_id = U1.id
WHERE U1.category_id = B.category_id
GROUP BY U0.a_id, U1.category_id
ORDER BY SUM(U0.intensity) DESC
LIMIT 1
) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id
通过使用后端特定功能(例如array_agg(Postgres)或GroupConcat(MySQL))收集在外部查询中分组在一起的Relation.id,在子查询中消除联接可能会更有效.但是我不知道您正在使用什么后端.
To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity
of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, when the relation between B and Category is n to 1)
I need an ORM to group Relation records by Category and A, then calculate Sum
of Intensity
in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum
in each Category.
My code is something like:
A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))
Which throws the error:
django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate
Django-group-by package with the same error.
For further information please also see this stackoverflow question.
I am using Django 2 and PostgreSQL.
Is there a way to achieve this using ORM, if there is not, what would be the solution using raw SQL expression?
Update
After lots of struggling I found out that what I wrote was indeed an aggregation, however what I want is to find out the maximum of AcSum of each A in each category. So I suppose I have to group-by the result once more after AcSum Calculation. Based on this insight I found a stack-overflow question which asks the same concept(The question was asked 1 year, 2 months ago without any accepted answer). Chaining another values('id') to the set does not function neither as a group_by nor as a filter for output attributes, It removes AcSum from the set. Adding AcSum to values() is also not an option due to changes in the grouped by result set. I think what I am trying to do is re grouping the grouped by query based on the fields inside a column (i.e id). any thoughts?
You can't do an aggregate of an aggregate Max(Sum())
, it's not valid in SQL, whether you're using the ORM or not. Instead, you have to join the table to itself to find the maximum. You can do this using a subquery. The below code looks right to me, but keep in mind I don't have something to run this on, so it might not be perfect.
from django.db.models import Subquery, OuterRef
annotation = {
'AcSum': Sum('intensity')
}
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)
# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1]
query = query.annotate(max_intensity=Subquery(subquery))
This should generate SQL like:
SELECT a_id, category_id,
(SELECT SUM(U0.intensity) AS AcSum
FROM RELATION U0
JOIN B U1 on U0.b_id = U1.id
WHERE U1.category_id = B.category_id
GROUP BY U0.a_id, U1.category_id
ORDER BY SUM(U0.intensity) DESC
LIMIT 1
) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id
It may be more performant to eliminate the join in Subquery by using a backend specific feature like array_agg (Postgres) or GroupConcat (MySQL) to collect the Relation.ids that are grouped together in the outer query. But I don't know what backend you're using.
这篇关于计算在Django ORM中按查询分组的带注释字段的总和最大值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!