在JOIN中使用DISTINCT会造成麻烦 [英] Using DISTINCT inside JOIN is creating trouble

查看:309
本文介绍了在JOIN中使用DISTINCT会造成麻烦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
如何使用两个内部联接修改此查询,以使其不再给出重复的结果?

Possible Duplicate:
How can I modify this query with two Inner Joins so that it stops giving duplicate results?

我无法使查询正常工作.

I'm having trouble getting my query to work.

SELECT itpitems.identifier, itpitems.name, itpitems.subtitle, itpitems.description, itpitems.itemimg, itpitems.mainprice, itpitems.upc, itpitems.isbn, itpitems.weight, itpitems.pages, itpitems.publisher, itpitems.medium_abbr, itpitems.medium_desc, itpitems.series_abbr, itpitems.series_desc, itpitems.voicing_desc, itpitems.pianolevel_desc, itpitems.bandgrade_desc, itpitems.category_code, itprank.overall_ranking, itpitnam.name AS artist, itpitnam.type_code FROM itpitems 
        INNER JOIN  itprank ON (itprank.item_number = itpitems.identifier) 
        INNER JOIN  (SELECT DISTINCT type_code FROM itpitnam) itpitnam ON (itprank.item_number = itpitnam.item_number)   
        WHERE mainprice > 1    
        LIMIT 3

我一直在字段列表"中得到未知"列"itpitnam.name".

I keep getting Unknown column 'itpitnam.name' in 'field list'.

但是,如果我将DISTINCT type_code更改为*,我不会得到该错误,但是我也不会得到想要的结果.

However, if I change DISTINCT type_code to *, I do not get that error, but I do not get the results I want either.

这是一个很大的结果表,所以我在做一个虚拟的例子...

This is a big result table so I am making a dummy example...

有了*,我得到类似的东西:

With *, I get something like:

+-----------+---------+----------+
| identifier| name    | type_code|
+-----------+---------+----------+
| 2         | Joe     | A        |
| 2         | Amy     | R        |
| 7         | Mike    | B        |
+-----------+------------+-------+

这里的问题是我有两个标识符= 2的实例,因为type_code是不同的. 我在查询的末尾尝试了GROUP BY ,但是它正在筛选太多的记录,这对服务器造成了很大的压力,因此,我正在尝试寻找一种替代方法来获取我需要的结果.

The problem here is that I have two instances of identifier = 2 because the type_code is different. I have tried GROUP BY at the outside end of the query, but it is sifting through so many records it creates too much strain on the server, so I'm trying to find an alternative way of getting the results I need.

我想要实现的目标(使用相同的虚拟输出)看起来像这样:

What I want to achieve (using the same dummy output) would look something like this:

+-----------+---------+----------+
| identifier| name    | type_code|
+-----------+---------+----------+
| 2         | Joe     | A        |
| 7         | Mike    | B        |
| 8         | Sam     | R        |
+-----------+------------+-------+

无论type_code是否不同,都应跳过重复的标识符.

It should skip over the duplicate identifier regardless if type_code is different.

有人可以帮我修改此查询以获取如上图所示的结果吗?

Can someone help me modify this query to get the results as simulated in the above chart?

推荐答案

一种方法是使用内联视图,就像您已经拥有的查询一样.但是,您可以使用GROUP BY来消除重复项,而不是使用DISTINCT.满足您要求的最简单的内联视图是:

One approach is to use an inline view, like the query you already have. But instead of using DISTINCT, you would use a GROUP BY to eliminate duplicates. The simplest inline view to satisfy your requirements would be:

( SELECT n.item_number, n.name, n.type_code
    FROM itpitnam n
   GROUP BY n.item_number
) itpitnam

尽管不确定从itpitnam的哪一行检索name和type_code的值.更加精细的内联视图可以使其更加具体.

Although its not deterministic as to which row from itpitnam the values for name and type_code are retrieved from. A more elaborate inline view can make this more specific.

解决此类问题的另一种常用方法是在SELECT列表中使用相关的子查询.为了返回一小组行,这可以很好地执行.但是对于返回大集合,有更有效的方法.

Another common approach to this type of problem is to use a correlated subquery in the SELECT list. For returning a small set of rows, this can perform reasonably well. But for returning large sets, there are more efficient approaches.

SELECT i.identifier
     , i.name
     , i.subtitle
     , i.description
     , i.itemimg 
     , i.mainprice
     , i.upc
     , i.isbn
     , i.weight
     , i.pages
     , i.publisher
     , i.medium_abbr
     , i.medium_desc
     , i.series_abbr
     , i.series_desc
     , i.voicing_desc
     , i.pianolevel_desc
     , i.bandgrade_desc
     , i.category_code
     , r.overall_ranking
     , ( SELECT n1.name
           FROM itpitnam n1
          WHERE n1.item_number = r.item_number
          ORDER BY n1.type_code, n1.name
          LIMIT 1
       ) AS artist
     , ( SELECT n2.type_code
           FROM itpitnam n2
          WHERE n2.item_number = r.item_number
          ORDER BY n2.type_code, n2.name
          LIMIT 1
       ) AS type_code
  FROM itpitems i
  JOIN itprank r
    ON r.item_number = i.identifier
 WHERE mainprice > 1
 LIMIT 3

该查询将返回指定的结果集,但有明显的不同.原始查询显示对itpitnam表的INNER JOIN.这意味着将仅在itpitnam表中存在匹配行的情况下返回一行.但是,上面的查询模拟了一个OUTER JOIN,当itpitnam中找不到匹配的行时,该查询将返回一行.

That query will return the specified resultset, with one significant difference. The original query shows an INNER JOIN to the itpitnam table. That means that a row will be returned ONLY of there is a matching row in the itpitnam table. The query above, however, emulates an OUTER JOIN, the query will return a row when there is no matching row found in itpitnam.

为了使这些相关子查询的性能最佳,您需要一个合适的索引,

For best performance of those correlated subqueries, you'll want an appropriate index available,

... ON itpitnam (item_number, type_code, name)

该索引最合适,因为它是一个覆盖索引",可以完全从索引中满足查询,而无需引用基础表中的数据页,并且在前导列上有相等谓词,在下一个列上有ORDER BY两列,这样可以避免排序"操作.

That index is most appropriate because it's a "covering index", the query can be satisfied entirely from the index without referencing data pages in the underlying table, and there's equality predicate on the leading column, and an ORDER BY on the next two columns, so that will a avoid a "sort" operation.

-

如果您保证itpitnam表中的type_codename列不是NOT NULL,则可以添加谓词以消除缺少"匹配行的行,例如

If you have a guarantee that either the type_code or name column in the itpitnam table is NOT NULL, you can add a predicate to eliminate the rows that are "missing" a matching row, e.g.

HAVING artist IS NOT NULL

(添加可能会影响性能.)如果没有这种保证,则需要添加INNER JOIN或谓词来测试匹配行的存在,以获取INNER JOIN行为.

(Adding that will likely have an impact on performance.) Absent that kind of guarantee, you'd need to add an INNER JOIN or a predicate that tests for the existence of a matching row, to get an INNER JOIN behavior.

这篇关于在JOIN中使用DISTINCT会造成麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆