使用mySQL语句查找带有斜杠的几乎重复的数据 [英] mySQL statement to find nearly duplicate data with trailing slash

查看:198
本文介绍了使用mySQL语句查找带有斜杠的几乎重复的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为"LINK_INFO"的表,该表的URL为"URL".问题是,除了某些URL使用尾随/来绕开唯一字段要求之外,还存在许多重复的URL.

I am have a table named 'LINK_INFO' with URLs in a field called 'URL'. The problem is, many duplicates URLs exist EXCEPT some have used a trailing / to get around the unique field requirement.

示例:

http://www.yahoo.com http://www.yahoo.com/

可用来选择这些近似重复的案例的语句是什么,因此我可以删除其中之一?非常感谢您的帮助.

What is the statement I can use to select these cases of near duplicates, so I can delete one of them? Many thanks if you can help.

推荐答案

您可以只使用

You can just use TRIM to find all unique values;

SELECT DISTINCT TRIM(TRAILING '/' FROM url) url
FROM link_info

要测试的SQLfiddle .

要立即删除重复项,只需执行删除联接;

To delete the duplicates right away, just do a delete join;

DELETE li1
FROM link_info li1
JOIN link_info li2
WHERE TRIM(TRAILING '/' FROM li1.url) =
      TRIM(TRAILING '/' FROM li2.url)
AND li1.id<li2.id

另一个用于测试的SQLfiddle .

在运行网上发现的任意SQL之前,总是备份您的表,甚至是我的:)

Always back up your tables before running arbitrary SQL found on the net, even mine :)

如果您的数据库计算机受到限制,则您可能希望使用索引来进行操作,并避免将其不必要地加载到内存中;

If your database machine is limited, you may want to do it using indexes and avoid loading more into memory than necessary;

-- remove all trailing slashes
UPDATE link_info 
SET url=TRIM(TRAILING '/' FROM url);

-- create an index on the resulting strings (if there isn't already one)    
CREATE INDEX url_index ON link_info(url);

-- delete all duplicates
DELETE li1
FROM link_info li1
JOIN link_info li2
WHERE li1.url = li2.url
AND li1.id<li2.id;

-- drop the index if not needed anymore
DROP INDEX url_index ON link_info;

还有另一个 SQLfiddle .

这篇关于使用mySQL语句查找带有斜杠的几乎重复的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆