如何使用GROUP BY和HAVING与SQLAlchemy和Postgresql来获取具有最大更新日期时间的行 [英] How to fetch rows with max update datetime using GROUP BY and HAVING with SQLAlchemy and Postgresql

查看:230
本文介绍了如何使用GROUP BY和HAVING与SQLAlchemy和Postgresql来获取具有最大更新日期时间的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要从SQLite转到Postgresql.这使我的查询之一不起作用.我尚不清楚为什么在SQLite中允许此查询,但在Postgresql中却不允许.该查询位于 find_recent_by_section_id_list()函数的下方.

I'm going from SQLite to Postgresql. This has made one of my queries not work. It's not clear to me why this query is allowed in SQLite, but not in Postgresql. The query in question is below in the find_recent_by_section_id_list() function.

我曾尝试以多种方式重写查询,但令我感到困惑的是,当我使用SQLite时此查询有效.

I've tried rewriting the query in multiple ways, but what is confusing me is that this query worked when I was working with SQLite.

设置为Flask,SQLAlchemy,Flask-SQLAlchemy和Postgresql.

The setup is Flask, SQLAlchemy, Flask-SQLAlchemy and Postgresql.

class SectionStatusModel(db.Model):

    __tablename__ = "sectionstatus"
    _id = db.Column(db.Integer, primary_key=True)
    update_datetime = db.Column(db.DateTime, nullable=False)
    status = db.Column(db.Integer, nullable=False, default=0)
    section_id = db.Column(db.Integer, db.ForeignKey("sections._id"), nullable=False)

    __table_args__ = (
        UniqueConstraint("section_id", "update_datetime", name="section_time"),
    )


    @classmethod
    def find_recent_by_section_id_list(
        cls, section_id_list: List
    ) -> List["SectionStatusModel"]:

        return (
            cls.query.filter(cls.section_id.in_(section_id_list))
            .group_by(cls.section_id)
            .having(func.max(cls.update_datetime) == cls.update_datetime)
        )

我希望该查询将为每个节返回最新的节状态,但是我得到以下错误:

I would expect that this query would return the latest section statuses, for each section, however I get the following error:

E       sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "sectionstatus._id" must appear in the GROUP BY clause or be used in an aggregate function
E       LINE 1: SELECT sectionstatus._id AS sectionstatus__id, sectionstatus...
E                      ^
E       
E       [SQL: SELECT sectionstatus._id AS sectionstatus__id, sectionstatus.update_datetime AS sectionstatus_update_datetime, sectionstatus.status AS sectionstatus_status, sectionstatus.section_id AS sectionstatus_section_id 
E       FROM sectionstatus 
E       WHERE sectionstatus.section_id IN (%(section_id_1)s, %(section_id_2)s) GROUP BY sectionstatus.section_id 
E       HAVING max(sectionstatus.update_datetime) = sectionstatus.update_datetime]
E       [parameters: {'section_id_1': 1, 'section_id_2': 2}]
E       (Background on this error at: http://sqlalche.me/e/f405)

这是测试套件的输出.

推荐答案

SQLite允许查询,因为它

The query is allowed in SQLite since it allows SELECT list items to refer to ungrouped columns outside of aggregate functions, or without said columns being functionally dependent on the grouping expressions. The non-aggregate values are picked from an arbitrary row in the group.

此外,侧注中记录了对裸露"的特殊处理.当聚合为 min() max() 1 :

In addition it is documented in a sidenote that special processing of "bare" columns in an aggregate query occurs, when the aggregate is min() or max() 1:

在汇总查询中使用 min() max()聚合函数时,结果集中的所有裸列均从输入行中获取值,该输入行中的值也会包含最小值或最大值.

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.

这仅适用于简单查询,并且如果多于1行具有相同的最小/最大值,或者查询包含对 min()/的多次调用,则再次存在歧义max().

This only applies to simple queries and there is again ambiguity, if more than 1 rows have the same min/max, or the query contains more than 1 call to min() / max().

这使得SQLite在这方面至少与SQL:2003标准不一致(我相当确定这在较新的版本中并没有太大变化):

This makes SQLite non-conforming in this respect, at least with the SQL:2003 standard (I'm fairly certain that this has not changed much in the newer versions):

7.12<查询规范>

功能

指定从< table expression>的结果派生的表.

7.12 <query specification>

Function

Specify a table derived from the result of a <table expression>.

<query specification> ::=
    SELECT [ <set quantifier> ] <select list> <table expression>

...

...

3)如果没有功能T301,则在遵循SQL语言的情况下,如果"T为分组表",则在<选择列表>中包含的每个<值表达式>中,每个<列引用>引用T的列应引用分组列,或在< set函数规范>的聚合参数中指定.

3) Without Feature T301, "Functional dependencies", in conforming SQL language, if T is a grouped table, then in each <value expression> contained in the <select list>, each <column reference> that references a column of T shall reference a grouping column or be specified in an aggregated argument of a <set function specification>.

大多数其他SQL DBMS(例如Postgresql)在这方面都更严格地遵循标准,并且要求聚合查询的 SELECT 列表仅包含分组表达式,聚合表达式或任何未分组的列在功能上取决于分组的列.

Most other SQL DBMS, such as Postgresql, follow the standard more closely in this respect, and require that the SELECT list of an aggregate query consist of only grouping expressions, aggregate expressions, or that any ungrouped columns are functionally dependent on the grouped columns.

然后在Postgresql中,需要一种不同的方法来获取这种结果.有很多很棒的帖子都涉及此主题,但是这里是一种特定于Postgresql的方法的摘要.使用 DISTINCT ON 扩展与 ORDER BY 结合使用,您可以达到相同的结果:

In Postgresql a different approach is then required in order to fetch this kind of greatest-n-per-group result. There are many great posts that cover this topic, but here's a summary of one Postgresql specific approach. Using the DISTINCT ON extension combined with ORDER BY you can achieve the same results:

@classmethod
def find_recent_by_section_id_list(
        cls, section_id_list: List) -> List["SectionStatusModel"]:
    return (
        cls.query
        .filter(cls.section_id.in_(section_id_list))
        .distinct(cls.section_id)
        # Use _id as a tie breaker, in order to avoid non-determinism
        .order_by(cls.section_id, cls.update_datetime.desc(), cls._id)
    )

自然,这将在SQLite中中断,因为它不支持 DISTINCT ON .如果您需要同时使用这两种解决方案的解决方案,请使用 row_number()窗口函数方法.

Naturally this will then break in SQLite, as it does not support DISTINCT ON. If you need a solution that works in both, use the row_number() window function approach.

1:请注意,这意味着您的 HAVING 子句实际上根本没有太多过滤,因为未分组的值将始终从包含最大值的行中选取.

1: Note that this means that your HAVING clause is in fact not much filtering at all, since the ungrouped value will always be picked from the row containing the maximum value. It is the mere presence of that max(update_datetime) that does the trick.

这篇关于如何使用GROUP BY和HAVING与SQLAlchemy和Postgresql来获取具有最大更新日期时间的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆