Django ORM - 使用不同的select子句分组聚合 [英] Django ORM - Grouped aggregates with different select clauses

查看:539
本文介绍了Django ORM - 使用不同的select子句分组聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下,我们有Django ORM模型 Meetup ,具有以下定义:

Imagine we have the Django ORM model Meetup with the following definition:

class Meetup(models.Model):
    language = models.CharField()
    speaker = models.CharField()
    date = models.DateField(auto_now=True)

我想使用单一查询来获取语言,演讲者和日期对于每种语言的
最新事件。

I'd like to use a single query to fetch the language, speaker and date for the latest event for each language.

>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
    {'speaker': u'mike', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'ryan', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'noah', 'language': u'node', 'latest_date': ...}, 
    {'speaker': u'shawn', 'language': u'node', 'latest_date': ...}, 
]

D'哦!我们得到最新的事件,但是对于错误的分组!

D'oh! We're getting the latest event, but for the wrong grouping!

似乎我需要一种方法来 GROUP BY 语言 SELECT 在不同的
字段集?

It seems like I need a way to GROUP BY the language but SELECT on a different set of fields?

更新 - 这种查询似乎在SQL中很容易表达:

Update - this sort of query seems fairly easy to express in SQL:

SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;

我不喜欢使用Django的 raw() - 可以吗?

I'd love a way to do this without using Django's raw() - is it possible?

更新2 - 经过多次搜索,似乎有类似的问题:

Update 2 - after much searching, it seems there are similar questions on SO:

  • Django Query that gets the most recent objects
  • How can I do a greatest n per group query in Django
  • MySQL calls this sort of query a group-wise maximum of a certain column.

更新3 - 最后,@ danihp的帮助,似乎最好你可以做
是两个查询。我使用了以下方法:

Update 3 - in the end, with @danihp's help, it seems the best you can do is two queries. I've used the following approach:

# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
                                   .annotate(latest_pk=Max("pk"))
                                   .values_list("latest_pk", flat=True))

# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)






这个问题是我之前提到的一个问题:


This question is a follow up to my previous question:

Django ORM - 获取组的最新记录

推荐答案

是一种容易解释但很难写的查询。如果这是SQL,我会建议您按照日期(desc)排序按语言排序的行排列的CTE过滤查询。

This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )

但这不是SQL,这是django查询api简单的方法是对每种语言进行查询:

But this is not SQL, this is django query api. Easy way is to do a query for each language:

languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [  Meetup
                     .objects
                     .filter( language = l )
                     .latest( 'date' )
                     for l in languages
                    ]

如果某些语言没有会议,则会崩溃。
另一种方法是获取每种语言的所有最大数据:

This crash if some language don't has meetings. The other approach is to get all max data for each language:

last_dates = ( Meetup
             .objects
             .values("language")
             .annotate(ldate=models.Max("date"))
             .order_by() )

q= reduce(lambda q,meetup: 
     q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ), 
     last_dates, Q())  

your_query = Meetup.objects.filter(q)

也许有人可以解释如何

编辑由于OP评论

您正在寻找:

"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"

不是所有的rdbms都支持这个表达式,因为没有包含在select子句的聚合函数中的所有字段都应该显示在group by条款。在你的情况下,说话者是在select子句(没有聚合函数),但不会显示在group by。

Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker is on select clause (without aggregated function) but not appear in group by.

mysql他们不是显示结果发言者的保证与最大日期匹配。因为这个,我们不是一个简单的查询。

In mysql they are not guaranties than showed result speaker was that match with max date. Because this, we are not facing a easy query.

引用 MySQL文档


在标准SQL中,查询包括一个GROUP BY子句不能将
引用到选择列表中未在
GROUP BY子句中命名的非集合列... 但是,这主要用于所有值
在GROUP BY中未命名的每个非分组列中每个组的
相同。

最多关闭查询以符合您的要求:

The most close query to match your requirements is:

Reults = (   Meetup
             .objects
             .values("language","speaker")
             .annotate(ldate=models.Max("date"))
             .order_by() )

这篇关于Django ORM - 使用不同的select子句分组聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆