DISTINCT与PARTITION BY与GROUPBY [英] DISTINCT with PARTITION BY vs. GROUPBY
问题描述
SELECT DISTINCT
公司,我已经在应用程序中发现了一些SQL查询,仓库,物料,
SUM(数量)OVER(分配到公司,仓库,物料)AS库存
SELECT
公司,仓库,项目,
SUM(数量)AS库存
GROUP BY公司,仓库,物料
表现:
优胜者: GROUP BY
在一个包含无索引列的大表上进行一些非常基本的测试表明,至少在我的情况下,这两个查询生成了完全不同的查询计划。用于 PARTITION BY
的那个显着较慢。 $ b
GROUP BY
查询计划仅包含表扫描和聚合操作,而 PARTITION BY
plan有两个嵌套循环自连接。第二次运行时, PARTITION BY
大约耗时2800毫秒, GROUP BY
只花费了500毫秒。
可读性/可维护性:
优胜者: 优胜者: I have found some SQL queries in an application I am examining like this: I'm quite sure this gives the same result as: Is there any benefit (performance, readability, additional flexibility in writing the query, maintainability, etc.) of using the first approach over the later? Winner: Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for The Winner: Based on the opinions of the commenters here the Winner: 这篇关于DISTINCT与PARTITION BY与GROUPBY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! GROUP BY
根据评论者的意见,对于大多数开发人员来说, PARTITION BY
对于大多数开发人员来说可读性较差,因此它会在未来可能难以维护。
灵活性
PARTITION BY
PARTITION BY
选择分组列。使用 GROUP BY
,所有聚合列只能有一组分组列。使用 DISTINCT + PARTITION BY
,您可以在每个分区中有不同的列。同样在一些DBMS中,您可以从 OVER
子句中的更多聚合/分析函数中进行选择。SELECT DISTINCT
Company, Warehouse, Item,
SUM(quantity) OVER (PARTITION BY Company, Warehouse, Item) AS stock
SELECT
Company, Warehouse, Item,
SUM(quantity) AS stock
GROUP BY Company, Warehouse, Item
Performance:
GROUP BY
PARTITION BY
was significantly slower. GROUP BY
query plan included only a table scan and aggregation operation while the PARTITION BY
plan had two nested loop self-joins. The PARTITION BY
took about 2800ms on the second run, the GROUP BY
took only 500ms.Readability / Maintainability:
GROUP BY
PARTITION BY
is less readable for most developers so it will be probably also harder to maintain in the future.Flexibility
PARTITION BY
PARTITION BY
gives you more flexibility in choosing the grouping columns. With GROUP BY
you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY
you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER
clause.