Django中GROUP BY中注释的汇总 [英] Aggregation of an annotation in GROUP BY in Django

查看:66
本文介绍了Django中GROUP BY中注释的汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新

感谢发布的答案,我找到了解决问题的简单得多的方法。最初的问题可以在修订历史记录中看到。

Thanks to the posted answer, I found a much simpler way to formulate the problem. The original question can be seen in the revision history.

我正在尝试翻译SQL查询到Django,但出现了我不理解的错误。

I am trying to translate an SQL query into Django, but am getting an error that I don't understand.

这是我拥有的Django模型:

Here is the Django model I have:

class Title(models.Model):
  title_id = models.CharField(primary_key=True, max_length=12)
  title = models.CharField(max_length=80)
  publisher = models.CharField(max_length=100)
  price = models.DecimalField(decimal_places=2, blank=True, null=True)

我有以下数据:

publisher                    title_id      price  title
---------------------------  ----------  -------  -----------------------------------
New Age Books                PS2106         7     Life Without Fear
New Age Books                PS2091        10.95  Is Anger the Enemy?
New Age Books                BU2075         2.99  You Can Combat    Computer Stress!
New Age Books                TC7777        14.99  Sushi, Anyone?
Binnet & Hardley             MC3021         2.99  The Gourmet Microwave
Binnet & Hardley             MC2222        19.99  Silicon Valley   Gastronomic Treats
Algodata Infosystems         PC1035        22.95  But Is It User Friendly?
Algodata Infosystems         BU1032        19.99  The Busy Executive's   Database Guide
Algodata Infosystems         PC8888        20     Secrets of Silicon Valley

这是我想做的:引入带注释的字段 dbl_price ,它是价格的两倍,然后将结果查询集按 publisher ,并为每个发布者计算该发布者发布的所有标题的所有 dbl_price 值的总和。

Here is what I want to do: introduce an annotated field dbl_price which is twice the price, then group the resulting queryset by publisher, and for each publisher, compute the total of all dbl_price values for all titles published by that publisher.

执行此操作的SQL查询如下:

The SQL query that does this is as follows:

SELECT SUM(dbl_price) AS total_dbl_price, publisher
FROM (
  SELECT price * 2 AS dbl_price, publisher
  FROM title
) AS A 
GROUP BY publisher

所需的输出将是:

publisher                    tot_dbl_prices
---------------------------  --------------
Algodata Infosystems                 125.88
Binnet & Hardley                      45.96
New Age Books                         71.86 



Django query



查询看起来像:

Django query

The query would look like:

Title.objects
 .annotate(dbl_price=2*F('price'))
 .values('publisher')
 .annotate(tot_dbl_prices=Sum('dbl_price'))

但给出错误:

KeyError: 'dbl_price'. 

表示找不到字段 dbl_price 在查询集中。

which indicates that it can't find the field dbl_price in the queryset.

这是发生此错误的原因:< href = https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#order-of-annotate-and-values-clauses rel = noreferrer>文档说

Here is why this error happens: the documentation says


您还应注意,average_rating已明确包含在要返回的值列表中
。这是必需的,因为values()和annotate()子句的顺序。

You should also note that average_rating has been explicitly included in the list of values to be returned. This is required because of the ordering of the values() and annotate() clause.

如果values()子句位于annotate()子句之前,则任何注释
将自动添加到结果集中。但是,如果
values()子句在annotate()子句之后应用,则需要显式包括聚合列。

If the values() clause precedes the annotate() clause, any annotations will be automatically added to the result set. However, if the values() clause is applied after the annotate() clause, you need to explicitly include the aggregate column.

因此,在汇总中找不到 dbl_price ,因为它是由先前的注释创建的,但是未包含在 values()中。

So, the dbl_price could not be found in aggregation, because it was created by a prior annotate, but wasn't included in values().

但是,我也不能将其包含在中,因为我想使用(后接另一个注释)作为分组设备,因为

However, I can't include it in values either, because I want to use values (followed by another annotate) as a grouping device, since


如果values()子句位于annotate()之前,则将使用values()子句描述的分组来计算注释。

If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.

这是Django 如何实现SQL的基础组BY 。这意味着我不能在 values()内包含 dbl_price ,因为这样分组将基于唯一的组合 publisher dbl_price 这两个字段,而我需要按 publisher

which is the basis of how Django implements SQL GROUP BY. This means that I can't include dbl_price inside values(), because then the grouping will be based on unique combinations of both fields publisher and dbl_price, whereas I need to group by publisher only.

因此,以下查询与上面的查询不同之处在于,我汇总了模型的价格字段,而不是带注释的 dbl_price 字段,实际上有效:

So, the following query, which only differs from the above in that I aggregate over model's price field rather than annotated dbl_price field, actually works:

Title.objects
 .annotate(dbl_price=2*F('price'))
 .values('publisher')
 .annotate(sum_of_prices=Count('price'))

因为 price 字段位于模型中,而不是一个带注释的字段,因此我们无需将其包含在中以将其保留在查询集中。

because the price field is in the model rather than being an annotated field, and so we don't need to include it in values to keep it in the queryset.

所以,在这里我们有了:我需要在 values 中包含带注释的属性,以使其保留在其中查询集但我不能这样做,因为 values 也用于分组(如果有多余的字段,这是错误的)。问题本质上是由于 values 在Django中使用的两种截然不同的方式,具体取决于上下文( values 后跟 annotate )-这是(1)值提取(SQL普通 SELECT 列表)和( 2)对组进行分组+聚合(SQL GROUP BY )-在这种情况下,这两种方式似乎冲突。

So, here we have it: I need to include annotated property into values to keep it in the queryset, but I can't do that because values is also used for grouping (which will be wrong with an extra field). The problem essentially is due to the two very different ways that values is used in Django, depending on the context (whether or not values is followed by annotate) - which is (1) value extraction (SQL plain SELECT list) and (2) grouping + aggregation over the groups (SQL GROUP BY) - and in this case these two ways seem to conflict.

我的问题是:有什么方法可以解决此问题(无需回退到原始sql之类的东西)?

My question is: is there any way to solve this problem (without things like falling back to raw sql)?

请注意:可以通过将所有注释语句移到之后来解决相关示例,有几个答案指出了这一点。但是,我对解决方案(或讨论)更感兴趣,该解决方案可以将注释语句保留在 values(),原因有以下三个:1.还有一些更复杂的示例,其中建议的解决方法不起作用。 2.我可以想象一下这样一种情况,其中带注释的查询集已传递给另一个函数,该函数实际上执行GROUP BY,因此我们所知道的唯一是带注释的字段的名称集及其类型。 3.这种情况似乎很简单,如果以前没有注意到和讨论过 values()的两种不同用法之间的冲突,这会让我感到惊讶。

Please note: the specific example in question can be solved by moving all annotate statements after values, which was noted by several answers. However, I am more interested in solutions (or discussion) which would keep the annotate statement(s) before values(), for three reasons: 1. There are also more complex examples, where the suggested workaround would not work. 2. I can imagine situations, where the annotated queryset has been passed to another function, which actually does GROUP BY, so that the only thing we know is the set of names of annotated fields, and their types. 3. The situation seems to be pretty straightforward, and it would surprise me if this clash of two distinct uses of values() has not been noticed and discussed before.

推荐答案

这可能为时已晚,但是我找到了解决方案(已在Django 1.11.1上进行了测试)。

This is maybe a bit too late, but I have found the solution (tested with Django 1.11.1).

问题是,调用提供分组所需的 .values('publisher')会删除所有注释,则不包含在 .values() 字段参数中。

The problem is, call to .values('publisher'), which is required to provide grouping, removes all annotations, that are not included in .values() fields param.

不能在 fields 参数中包含 dbl_price ,因为它将添加另一个 GROUP BY 语句

And we can't include dbl_price to fields param, because it will add another GROUP BY statement.

进行所有聚合的解决方案,首先需要带注释的字段,然后调用 .values()并将汇总包含到 fields 参数中(这不会添加 GROUP BY ,因为它们是汇总)。
然后我们应该使用任何表达式调用 .annotate()-这将使django添加 GROUP BY 语句使用查询中唯一的非聚合字段-发布者进行SQL查询。

The solution in to make all aggregation, which require annotated fields firstly, then call .values() and include that aggregations to fields param(this won't add GROUP BY, because they are aggregations). Then we should call .annotate() with ANY expression - this will make django add GROUP BY statement to SQL query using the only non-aggregation field in query - publisher.

Title.objects
    .annotate(dbl_price=2*F('price'))
    .annotate(sum_of_prices=Sum('dbl_price'))
    .values('publisher', 'sum_of_prices')
    .annotate(titles_count=Count('id'))

此方法的唯一缺点-如果除了带有带注释字段的聚合之外,不需要其他任何聚合,则无论如何都必须包括一些聚合。如果没有对.annotate()的最后调用(它应该至少包含一个表达式!),Django将不会在SQL查询中添加 GROUP BY 。处理此问题的一种方法就是创建字段的副本:

The only minus with this approach - if you don't need any other aggregations except that one with annotated field - you would have to include some anyway. Without last call to .annotate() (and it should include at least one expression!), Django will not add GROUP BY to SQL query. One approach to deal with this is just to create a copy of your field:

Title.objects
    .annotate(dbl_price=2*F('price'))
    .annotate(_sum_of_prices=Sum('dbl_price')) # note the underscore!
    .values('publisher', '_sum_of_prices')
    .annotate(sum_of_prices=F('_sum_of_prices')

还要提一下,您应该谨慎使用QuerySet排序,最好调用 .order_by()或不带参数来清除排序,或者与您一起使用 GROUP BY 字段。如果结果查询包含按其他任何字段排序,则分组将是错误的。
https://docs.djangoproject.com/en /1.11/topics/db/aggregation/#interaction-with-default-ordering-or-order-by

Also, mention, that you should be careful with QuerySet ordering. You'd better call .order_by() either without parametrs to clear ordering, or with you GROUP BY field. If the resulting query will contain ordering by any other field, the grouping will be wrong. https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#interaction-with-default-ordering-or-order-by

此外,您可能希望删除该内容输出中的假批注,因此再次调用.values()
因此,最终代码如下:

Also, you might want to remove that fake annotation from your output, so call .values() again. So, final code looks like:

Title.objects
    .annotate(dbl_price=2*F('price'))
    .annotate(_sum_of_prices=Sum('dbl_price'))
    .values('publisher', '_sum_of_prices')
    .annotate(sum_of_prices=F('_sum_of_prices')
    .values('publisher', 'sum_of_prices')
    .order_by('publisher')

这篇关于Django中GROUP BY中注释的汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆