JOIN 后的 GROUP 或 DISTINCT 返回重复项 [英] GROUP or DISTINCT after JOIN returns duplicates

查看：21 发布时间：2021/12/17 21:01:43 sql postgresql join group-by distinct

本文介绍了JOIN 后的 GROUP 或 DISTINCT 返回重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个表，products 和 meta.它们是 1:N 关系，其中每个产品行通过外键至少有一个元行.

I have two tables, products and meta. They are in relation 1:N where each product row has at least one meta row via foreign key.

(即 SQLfiddle:http://sqlfiddle.com/#!15/c8f34/1)

(viz. SQLfiddle: http://sqlfiddle.com/#!15/c8f34/1)

我需要加入这两个表，但我只需要过滤独特的产品.当我尝试此查询时，一切正常(返回 4 行):

I need to join these two tables but i need to filter only unique products. When I try this query, everything is ok (4 rows returned):

SELECT DISTINCT(product_id)
FROM meta JOIN products ON products.id = meta.product_id

但是当我尝试选择所有列时，DISTINCT 规则不再适用于结果，因为返回的是 8 行而不是 4 行.

but when I try to select all columns the DISTINCT rule no longer applies to results, as 8 rows instead of 4 is returned.

SELECT DISTINCT(product_id), *
FROM meta JOIN products ON products.id = meta.product_id

我尝试了很多方法，比如尝试在子查询上使用 DISTINCT 或 GROUP BY 但总是得到相同的结果.

I have tried many approaches like trying to DISTINCT or GROUP BY on sub-query but always with same result.

推荐答案

虽然从表中检索所有或大部分行，但此类查询的最快方法通常是首先聚合/消除歧义稍后加入:

While retrieving all or most rows from a table, the fastest way for this type of query typically is to aggregate / disambiguate first and join later:

SELECT *
FROM   products p
JOIN  (
   SELECT DISTINCT ON (product_id) *
   FROM   meta
   ORDER  BY product_id, id DESC
   ) m ON m.product_id = p.id;

products 中每行 meta 中的行数越多，对性能的影响就越大.

The more rows in meta per row in products, the bigger the impact on performance.

当然，您需要在子查询中添加一个 ORDER BY 子句来定义哪行以从子查询中的每个集合中选择.@Craig 和 @Clodoaldo 已经告诉过你了.我正在返回具有最高 id 的 meta 行.

Of course, you'll want to add an ORDER BY clause in the subquery do define which row to pick form each set in the subquery. @Craig and @Clodoaldo already told you about that. I am returning the meta row with the highest id.

SQL 小提琴.

DISTINCT ON 的详细信息:

选择每个 GROUP BY 组中的第一行?

不过，这并不总是最快的解决方案.根据数据分布，有各种其他查询样式.对于这个涉及另一个连接的简单案例，这个在大表测试中的运行速度要快得多:

Still, this is not always the fastest solution. Depending on data distribution there are various other query styles. For this simple case involving another join, this one ran considerably faster in a test with big tables:

SELECT p.*, sub.meta_id, m.product_id, m.price, m.flag
FROM  (
   SELECT product_id, max(id) AS meta_id
   FROM   meta
   GROUP  BY 1
   ) sub
JOIN meta     m ON m.id = sub.meta_id
JOIN products p ON p.id = sub.product_id;

如果您不使用非描述性的 id 作为列名，我们就不会遇到命名冲突，只需编写 SELECT p.*, m.*.(我从不使用 id 作为列名.)

If you wouldn't use the non-descriptive id as column names, we would not run into naming collisions and could simply write SELECT p.*, m.*. (I never use id as column name.)

如果性能是您的首要要求，请考虑更多选择:

If performance is your paramount requirement, consider more options:

a 实体化视图 使用来自 meta 的预聚合数据，如果您的数据没有变化(很多).
一个递归 CTE 模拟 松散索引扫描big meta 表，每个产品有许多行(相对较少的不同 product_id).
这是我所知道的对整个表的 DISTINCT 查询使用索引的唯一方法.

a MATERIALIZED VIEW with pre-aggregated data from meta, if your data does not change (much).
a recursive CTE emulating a loose index scan for a big meta table with many rows per product (relatively few distinct product_id).
This is the only way I know to use an index for a DISTINCT query over the whole table.

这篇关于JOIN 后的 GROUP 或 DISTINCT 返回重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

JOIN 后的 GROUP 或 DISTINCT 返回重复项 [英] GROUP or DISTINCT after JOIN returns duplicates

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

JOIN 后的 GROUP 或 DISTINCT 返回重复项 [英] GROUP or DISTINCT after JOIN returns duplicates

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭