GROUP BY不会删除重复项 [英] GROUP BY does not remove duplicates

查看:360
本文介绍了GROUP BY不会删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个我编写的监视列表系统,在用户的监视列表的概览中,他们会看到一个记录列表,但是当列表中的数据库只显示确切的,正确的编号时,列表显示重复。 / p>

我试过 GROUP BY watch.watch_id GROUP BY rec.record_id code>,没有任何类型的组我尝试似乎删除重复。我不确定我在做什么错。

  SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM

watchlist watch

LEFT OUTER JOIN记录rec ON rec.record_id = watch.record_id

LEFT OUTER JOIN成员usr ON rec.user_id = usr.user_id

WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0,25

关注列表如下所示:

  + ---------- + --------- + ----------- + ------------ + 
| watch_id | user_id | record_id | watch_date |
+ ---------- + --------- + ----------- + ------------ +
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+ ---------- + --------- + ----------- + ------------ +


解决方案

GROUP BY不会删除重复项。 GROUP BY允许聚合。如果你想要结合重复的行,使用SELECT DISTINCT。



如果你需要合并某些列中重复的行,使用GROUP BY,但你需要指定如何处理其他列。您可以省略它们(通过不在SELECT子句中列出它们)或者将它们聚合(使用SUM,MIN和AVG等函数)。例如:

  SELECT watch.watch_id,COUNT(rec.street_number),MAX(watch.watch_date)
。 .. GROUP by watch.watch_id

编辑



OP要求澄清。



考虑视图 - 所有由FROM和JOIN和WHERE - 称为V.您可能需要做两件事。



首先,您可能需要完全重复的行合并:

  abc 
- - -
1 2 3
1 2 3
3 4 5

然后简单地使用DISTINCT

  SELECT DISTINCT * FROM V; 

abc
- - -
1 2 3
3 4 5

或者,您可能需要部分重复的行,您希望将它们合并:

  abc 
- - -
1 2 3
1 2 6
3 4 5

在某种意义上,前两行是相同的,但在另一种意义上明显不同(特别是,它们不会被SELECT DISTINCT组合) 。你必须决定如何组合它们。你可以放弃c列作为不重要的:

  SELECT DISTINCT a,b FROM V; 

ab
- -
1 2
3 4

或者你可以对它们进行某种聚合。你可以添加它们:

  SELECT a,b,SUM(c)totFROM V GROUP BY a,b; 

ab tot
- - ---
1 2 9
3 4 5

您可以添加一个最小值:

  SELECT a,b,MIN c)第一FROM V GROUP BY a,b; 

ab先
- - -----
1 2 3
3 4 5


什么不是真正的选择是什么都不做。如果你只列出未分组的列,DBMS将会抛出一个错误(Oracle做这个 - 正确的选择,imo),或者随机选择一个或多个值(MySQL)。但正如皮特博士所说:当你选择不做决定时,你仍然做出了选择。


I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.

I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE  watch.user_id = 1 
GROUP BY watch.watch_id
LIMIT 0, 25

The watchlist table looks like this:

+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
|       13 |       1 |        22 | 1314038274 |
|       14 |       1 |        25 | 1314038995 |
+----------+---------+-----------+------------+

解决方案

GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.

If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:

SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id

EDIT

The OP asked for some clarification.

Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.

First, you might have completely duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 3
3 4 5

Then simply use DISTINCT

SELECT DISTINCT * FROM V;

a b c
- - -
1 2 3
3 4 5

Or, you might have partially duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 6
3 4 5

Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:

SELECT DISTINCT a,b FROM V;

a b
- -
1 2
3 4

Or you could perform some kind of aggregation on them. You could add them up:

SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;

a b tot
- - ---
1 2 9
3 4 5

You could add pick the smallest value:

SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;

a b first
- - -----
1 2 3
3 4 5

Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.

What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."

这篇关于GROUP BY不会删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆