SQL:删除重复的记录 - 尽管不同 [英] SQL: Removing Duplicate records - Albeit different kind

查看:93
本文介绍了SQL:删除重复的记录 - 尽管不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下表:

  TAB6 
ABC
-------- - ---------- -
1 2 A
2 1 A
2 3 C
3 4 D

我认为,记录{1,2,A}和{2,1,A}是重复的。我需要选择并生成以下记录集:

  ABCABC 
---------- ---------- - ---------- ---------- -
1 2 A或2 1 A
2 3 C 2 3 C
3 4 D 3 4 D

我尝试了以下查询。但是没有用。

  select t1。* 
from t6 t1
,t6 t2
其中t1.a - t2.b
和t1.b<> t2.a
和t1.rowid<> t2.rowid
/

ABC
---------- ---------- -
1 2 A
2 1 A
2 1 A
2 3 C
3 4 D
3 4 D

选择6行。

甚至这样:

  select * 
from t6 t1
where exists(select * from t6 t2 where t1.a< t2.b and t1.b - t2.a )
/
ABC
---------- ---------- -
1 2 A
2 1 A
2 3 C
3 4 D

两者都没有起作用。 p>

数据库将是Oracle 10g。寻找纯SQL解决方案。

解决方案

使用GREATEST()和LEAST()函数来标识跨多个列的常用值。然后使用DISTINCT来确认重复项。

 选择不同的(a,b)作为
,最大( a,b)as b
,c
from t6

这给了你您要求的精确记录集。但是如果您需要包括T6中的其他列,事情会变得更加复杂。







但是我想知道这是否也适用于VARCHAR2字段?


是的,但它将使用ASCII值来确定顺序,不总是你可能期望的(或欲望)。


此外,我的表T6可能有几万条记录。 p>

今天的数据确实不是很多。 DISTINCT将导致排序,应该能够适应内存,除非 A B 真的很长VARCHAR2列 - 但是可能是这样。



如果这是一个查询,你将要运行很多,那么你可以构建一个基于函数的索引来满足它: / p>

 在t6上创建索引t6_fbi(最少(a,b)
,最大(a,b)
,c)
/

但是,如果你有一个真正的性能问题,我真的只会打扰查询。


Consider the following table:

TAB6
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

I consider, the records {1,2, A} and {2, 1, A} as duplicate. I need to select and produce the below record set:

         A          B C                      A          B C
---------- ---------- -             ---------- ---------- -
         1          2 A         or           2          1 A
         2          3 C                      2          3 C
         3          4 D                      3          4 D

I tried the below queries. But to no avail.

select t1.*
from t6 t1
, t6 t2
where t1.a <> t2.b
and t1.b <> t2.a
and t1.rowid <> t2.rowid
/

         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          1 A
         2          3 C
         3          4 D
         3          4 D

6 rows selected.

Or even this:

 select *
 from t6 t1
 where exists (select * from t6 t2 where t1.a <> t2.b and t1.b <> t2.a)
/
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

Both did not work.

The database would be Oracle 10g. Looking for a pure SQL solution. Every help is appreciated.

解决方案

Use GREATEST() and LEAST() functions to identify the common values across multiple columns. Then use DISTINCT to winnow out the duplicates.

select distinct least(a, b) as a
       , greatest(a, b) as b
       , c
from t6 

This gives you the precise record set you asked for. But things will get more complicated if you need to include other columns from T6.


"But I was wondering if this will work for VARCHAR2 fields also?"

Yes but it will use ASCII values to determine order, which is not always what you might expect (or desire).

"Also, my table T6 might have tens of thousand of records."

That really isn't a lot of data in today's terms. The DISTINCT will cause a sort, which should be able to fit in memory unless A and B are really long VARCHAR2 columns - but probably even then.

If this is a query you're going to want to run a lot then you can build a function-based index to satisfy it:

create index t6_fbi on t6(least(a, b)
                           , greatest(a, b)
                           , c )
/

But I would really only bother if you have a genuine performance issue with the query.

这篇关于SQL:删除重复的记录 - 尽管不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆