计算在Django ORM中按查询分组的带注释字段的总和最大值? [英] Calculate Max of Sum of an annotated field over a grouped by query in Django ORM?

查看:469
本文介绍了计算在Django ORM中按查询分组的带注释字段的总和最大值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为简单起见,我有四个表(A,B,类别和关系),关系表将A的Intensity存储在B中,类别存储B的类型.

A< ---关系---> B --->类别

(因此,当B和Category之间的关系是n到1时,A和B之间的关系是n到n)

我需要一个ORM来按类别和A对关系记录进行分组,然后在每个(类别,A)中计算IntensitySum(似乎很简单,直到此处),然后我要对计算出的Sum的最大值进行注释在每个类别中.

我的代码如下:

 A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))

哪个会引发错误:

django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate

Django-group-by 程序包,具有相同的错误.

有关更多信息,请参见此stackoverflow问题.

我正在使用Django 2和PostgreSQL.

是否有一种使用ORM来实现此目标的方法,如果没有,使用原始SQL表达式的解决方案是什么?

更新

经过大量的努力,我发现我写的确实是一个聚合,但是我想要的是找出每个类别中每个A的AcSum的最大值.因此,我想我必须在AcSum计算之后再次对结果进行分组.基于这一见解,我发现了一个 stack-overflow问题提出相同的概念(问题是在1年零2个月前提出的,没有任何可接受的答案). 将另一个值('id')链接到集合既不能用作group_by,也不能用作输出属性的过滤器,它会从集合中删除AcSum.由于按结果集分组的更改,因此也无法将AcSum添加到values(). 我认为我想做的是基于列内的字段(即id)对按查询分组的分组进行重新分组. 有什么想法吗?

解决方案

您无法进行汇总Max(Sum())的汇总,无论您是否使用ORM,它在SQL中都是无效的.相反,您必须将表自身联接起来才能找到最大值.您可以使用子查询来执行此操作.下面的代码对我来说似乎很正确,但是请记住,我没有什么可以运行的,所以它可能并不完美.

from django.db.models import Subquery, OuterRef

annotation = {
    'AcSum': Sum('intensity')
}
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)

# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1]

query = query.annotate(max_intensity=Subquery(subquery))

这应该生成类似以下的SQL

SELECT a_id, category_id,
       (SELECT SUM(U0.intensity) AS AcSum
        FROM RELATION U0
        JOIN B U1 on U0.b_id = U1.id
        WHERE U1.category_id = B.category_id
        GROUP BY U0.a_id, U1.category_id
        ORDER BY SUM(U0.intensity) DESC
        LIMIT 1
       ) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id

通过使用后端特定功能(例如array_agg(Postgres)或GroupConcat(MySQL))收集在外部查询中分组在一起的Relation.id,在子查询中消除联接可能会更有效.但是我不知道您正在使用什么后端.

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.

A <--- Relation ---> B ---> Category

(So the relation between A and B is n to n, when the relation between B and Category is n to 1)

I need an ORM to group Relation records by Category and A, then calculate Sum of Intensity in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum in each Category.

My code is something like:

 A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))

Which throws the error:

django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate

Django-group-by package with the same error.

For further information please also see this stackoverflow question.

I am using Django 2 and PostgreSQL.

Is there a way to achieve this using ORM, if there is not, what would be the solution using raw SQL expression?

Update

After lots of struggling I found out that what I wrote was indeed an aggregation, however what I want is to find out the maximum of AcSum of each A in each category. So I suppose I have to group-by the result once more after AcSum Calculation. Based on this insight I found a stack-overflow question which asks the same concept(The question was asked 1 year, 2 months ago without any accepted answer). Chaining another values('id') to the set does not function neither as a group_by nor as a filter for output attributes, It removes AcSum from the set. Adding AcSum to values() is also not an option due to changes in the grouped by result set. I think what I am trying to do is re grouping the grouped by query based on the fields inside a column (i.e id). any thoughts?

解决方案

You can't do an aggregate of an aggregate Max(Sum()), it's not valid in SQL, whether you're using the ORM or not. Instead, you have to join the table to itself to find the maximum. You can do this using a subquery. The below code looks right to me, but keep in mind I don't have something to run this on, so it might not be perfect.

from django.db.models import Subquery, OuterRef

annotation = {
    'AcSum': Sum('intensity')
}
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)

# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1]

query = query.annotate(max_intensity=Subquery(subquery))

This should generate SQL like:

SELECT a_id, category_id,
       (SELECT SUM(U0.intensity) AS AcSum
        FROM RELATION U0
        JOIN B U1 on U0.b_id = U1.id
        WHERE U1.category_id = B.category_id
        GROUP BY U0.a_id, U1.category_id
        ORDER BY SUM(U0.intensity) DESC
        LIMIT 1
       ) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id

It may be more performant to eliminate the join in Subquery by using a backend specific feature like array_agg (Postgres) or GroupConcat (MySQL) to collect the Relation.ids that are grouped together in the outer query. But I don't know what backend you're using.

这篇关于计算在Django ORM中按查询分组的带注释字段的总和最大值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆