带有两个联接的查询中的mysql聚合函数给出了意外的结果 [英] mysql aggregate functions in query with two joins gives unexpected results

查看:111
本文介绍了带有两个联接的查询中的mysql聚合函数给出了意外的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下(非常简化的)mysql表结构:

Given the following (very simplified) mysql table structure:

产品

  • id

product_categories

  • id
  • product_id
  • 状态(整数)

product_tags

  • id
  • product_id
  • some_other_numeric_value

我正在尝试查找与某个product_tag有关联并且与至少一个其状态属性为1的类别相关的产品.

I am trying to find every product that has an association to a certain product_tag, and that a relation to at least one category whichs status-attribute is 1.

我尝试了以下查询:

SELECT *

FROM `product` p

JOIN `product_categories` pc
ON p.`product_id` = pc.`product_id`

JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`

WHERE pt.`some_value` = 'some comparison value'

GROUP BY p.`product_id`

HAVING SUM( pc.`status` ) > 0

ORDER BY SUM( pt.`some_other_numeric_value` ) DESC

现在我的问题是:SUM(pt.some_other_numeric_value)返回意外的值.

Now my problem is: The SUM(pt.some_other_numeric_value) returns unexpected values.

我意识到,如果所讨论的产品与 product_categories 表具有多个关联,那么与 product_tags 表的每个关联都会被计入与与 product_categories 表的关系!

I realized that if the product in question has more then one relation to the product_categories table, then every relation to the product_tags table is counted as many timed as there are relations to the product_categories table!

例如:如果id为1的产品与id为2、3和4的product_categories有关系,并且与id为5和6的product_tags有关系-那么如果我插入GROUP_CONCAT(pt.id),则它会给出 5,6,5,6,5,6 而不是预期的 5,6 .

For example: If product with id=1 has a relation to product_categories with ids = 2, 3 and 4, and a relation with the product_tags with ids 5 and 6 - then if I insert a GROUP_CONCAT(pt.id), then it does give 5,6,5,6,5,6 instead of the expected 5,6.

起初,我怀疑联接类型(左联接,右联接,内部联接等)存在问题,因此我尝试了我所知道的每种联接类型,但均无济于事.我还尝试在GROUP BY子句中包含更多id字段,但这也不能解决问题.

At first I suspected it was a problem with the join type (left join, right join, inner join, and so on), so I tried every join type that I know of, but to no avail. I also tried to include more id-fields into the GROUP BY clause, but this didn´t solve the problem either.

有人可以向我解释这里实际上出了什么问题吗?

Can somebody explain to me what is actually going wrong here?

推荐答案

您通过1:n关系将主"(product)表连接到两个表(tagscategories),所以这是预期,您正在创建一个迷你笛卡尔积.对于同时具有一个以上关联标签和一个以上关联类别的那些产品,将在结果集中创建多行.如果按分组依据,则汇总函数中的结果有误.

You join a "main" (product) table to two tables (tags and categories) via 1:n relationships, so this is expected, you are creating a mini cartesian product. For those products that have both more than one associated tags and more than one associated categories, multiple rows are created in the result set. If you Group By, you have wrong results in aggregate functions.

避免这种情况的一种方法是删除两个联接之一,如果不需要该表的结果,则这是一个有效的策略.假设您在product_categories表的SELECT列表中不需要任何内容​​.然后,您可以使用半联接(EXISTS subquery)到该表:

One way to avoid this is to remove one of the two joins, which is a valid startegy if you don't need results from that table. Say you don't need anything in the SELECT list from the product_categories table. Then you can use a semi-join (the EXISTS subquery)to that table:

SELECT p.*,
       SUM( pt.`some_other_numeric_value` )

FROM `product` p

JOIN `product_tags` pt
  ON p.`product_id` = pt.`product_id`

WHERE pt.`some_value` = 'some comparison value'

  AND EXISTS
      ( SELECT *
        FROM product_categories pc
        WHERE pc.product_id = pc.product_id
         AND  pc.status = 1
      ) 

GROUP BY p.`product_id`

ORDER BY SUM( pt.`some_other_numeric_value` ) DESC ;


规避此问题的另一种方法是-在GROUP BY MainTable.pk之后-在COUNT()GROUP_CONCAT()聚合函数内使用DISTINCT.此方法有效,但不能与SUM()一起使用.因此,它在您的特定查询中没有用.


Another way to circumvent this problem is - after the GROUP BY MainTable.pk - to use DISTINCT inside the COUNT() or GROUP_CONCAT() aggregate functions. This works but you can't use it with SUM(). So, it's not useful in your specific query.

第三个选项-始终有效-首先将两个(或更多)边桌分组,然后再连接到主表.在您的情况下是这样的:

A third option - which works always - is to first group by the two (or more) side tables and then join to the main table. Something like this in your case:

SELECT p.* ,
       COALESCE(pt.sum_other_values, 0) AS sum_other_values
       COALESCE(pt.cnt, 0) AS tags_count,
       COALESCE(pc.cnt, 0) AS categories_count,
       COALESCE(category_titles, '') AS category_titles

FROM `product` p

JOIN 
    ( SELECT product_id
           , COUNT(*) AS cnt
           , GROUP_CONCAT(title) AS category_titles
      FROM `product_categories` pc
      WHERE status = 1
      GROUP BY product_id
    ) AS pc
  ON p.`product_id` = pc.`product_id`

JOIN 
    ( SELECT product_id
           , COUNT(*) AS cnt
           , SUM(some_other_numeric_value) AS sum_other_values
      FROM `product_tags` pt
      WHERE some_value = 'some comparison value'
      GROUP BY product_id
    ) AS pt
ON p.`product_id` = pt.`product_id`

ORDER BY sum_other_values DESC ;

并不一定要在其中使用COALESCE()-以防万一您将内部联接更改为LEFT外部联接.

The COALESCE() are not strictly needed there - just in case you chnage the inner joins to LEFT outer joins.

这篇关于带有两个联接的查询中的mysql聚合函数给出了意外的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆