JOIN返回重复项后GROUP或DISTINCT [英] GROUP or DISTINCT after JOIN returns duplicates

查看:94
本文介绍了JOIN返回重复项后GROUP或DISTINCT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两张表,产品 meta 。它们的关系是1:N,其中每个产品行通过外键至少有一个元行。



(即SQLfiddle: http://sqlfiddle.com/#!15/c8f34/1

我需要加入这两个表格,但我只需要过滤唯一的产品。当我尝试这个查询时,一切正常(返回4行):

  SELECT DISTINCT(product_id)
FROM meta将产品加入products.id = meta.product_id

但是当我尝试选择所有列时,DISTINCT规则不再适用于结果,因为返回的是8行而不是4。

  SELECT DISTINCT(product_id),* 
FROM meta JOIN products on products.id = meta.product_id

我尝试了很多方法, code> DISTINCT 或 GROUP BY 进行子查询,但总是具有相同的结果。

聚合/消除歧义,然后加入 >

  SELECT * 
从产品p
JOIN(
SELECT DISTINCT ON(product_id)*
FROM meta
ORDER BY product_id ,id DESC
)m ON m.product_id = p.id;

每行中 meta products ,对性能的影响越大。



当然,您需要添加一个<$ c子查询中的$ c> ORDER BY 子句定义了要在子查询中设置哪个行。 @Craig和@Clodoaldo已经告诉过你了。我返回最高 id meta 行。



SQL小提琴



< $ b
  • /stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group/7630564#7630564\">在每个GROUP BY组中选择第一行?



  • 优化性能



    不过,这并不总是最快的解决方案。根据数据分布,还有其他各种查询样式。对于这个涉及另一个连接的简单情况,这个在大表测试中跑得快得多:

      SELECT p。*,sub .meta_id,m.product_id,m.price,m.flag 
    FROM(
    SELECT product_id,max(id)AS meta_id
    FROM meta
    GROUP BY 1
    )sub
    JOIN meta m ON m.id = sub.meta_id
    JOIN产品p ON p.id = sub.product_id;

    如果您不使用非描述性的 id 作为列名,我们不会碰到命名冲突,并且可以简单地写 SELECT p。*,m。* 。 (我从来没有使用 id 作为列名。)



    如果性能是您的至高选择要求,考虑更多的选择:


    I have two tables, products and meta. They are in relation 1:N where each product row has at least one meta row via foreign key.

    (viz. SQLfiddle: http://sqlfiddle.com/#!15/c8f34/1)

    I need to join these two tables but i need to filter only unique products. When I try this query, everything is ok (4 rows returned):

    SELECT DISTINCT(product_id)
    FROM meta JOIN products ON products.id = meta.product_id
    

    but when I try to select all columns the DISTINCT rule no longer applies to results, as 8 rows instead of 4 is returned.

    SELECT DISTINCT(product_id), *
    FROM meta JOIN products ON products.id = meta.product_id
    

    I have tried many approaches like trying to DISTINCT or GROUP BY on sub-query but always with same result.

    解决方案

    While retrieving all or most rows from a table, the fastest way for this type of query typically is to aggregate / disambiguate first and join later:

    SELECT *
    FROM   products p
    JOIN  (
       SELECT DISTINCT ON (product_id) *
       FROM   meta
       ORDER  BY product_id, id DESC
       ) m ON m.product_id = p.id;
    

    The more rows in meta per row in products, the bigger the impact on performance.

    Of course, you'll want to add an ORDER BY clause in the subquery do define which row to pick form each set in the subquery. @Craig and @Clodoaldo already told you about that. I am returning the meta row with the highest id.

    SQL Fiddle.

    Details for DISTINCT ON:

    Optimize performance

    Still, this is not always the fastest solution. Depending on data distribution there are various other query styles. For this simple case involving another join, this one ran considerably faster in a test with big tables:

    SELECT p.*, sub.meta_id, m.product_id, m.price, m.flag
    FROM  (
       SELECT product_id, max(id) AS meta_id
       FROM   meta
       GROUP  BY 1
       ) sub
    JOIN meta     m ON m.id = sub.meta_id
    JOIN products p ON p.id = sub.product_id;
    

    If you wouldn't use the non-descriptive id as column names, we would not run into naming collisions and could simply write SELECT p.*, m.*. (I never use id as column name.)

    If performance is your paramount requirement, consider more options:

    • a MATERIALIZED VIEW with pre-aggregated data from meta, if your data does not change (much).
    • a recursive CTE emulating a loose index scan for a big meta table with many rows per product (relatively few distinct product_id).
      This is the only way I know to use an index for a DISTINCT query over the whole table.

    这篇关于JOIN返回重复项后GROUP或DISTINCT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆