使用mySQL语句查找带有斜杠的几乎重复的数据 [英] mySQL statement to find nearly duplicate data with trailing slash

查看：198 发布时间：2020/5/27 19:21:30 mysql phpmyadmin

本文介绍了使用mySQL语句查找带有斜杠的几乎重复的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个名为"LINK_INFO"的表，该表的URL为"URL".问题是，除了某些URL使用尾随/来绕开唯一字段要求之外，还存在许多重复的URL.

I am have a table named 'LINK_INFO' with URLs in a field called 'URL'. The problem is, many duplicates URLs exist EXCEPT some have used a trailing / to get around the unique field requirement.

示例:

http://www.yahoo.com 和 http://www.yahoo.com/

可用来选择这些近似重复的案例的语句是什么，因此我可以删除其中之一?非常感谢您的帮助.

What is the statement I can use to select these cases of near duplicates, so I can delete one of them? Many thanks if you can help.

推荐答案

您可以只使用

You can just use TRIM to find all unique values;

SELECT DISTINCT TRIM(TRAILING '/' FROM url) url
FROM link_info

要测试的SQLfiddle .

要立即删除重复项，只需执行删除联接；

To delete the duplicates right away, just do a delete join;

DELETE li1
FROM link_info li1
JOIN link_info li2
WHERE TRIM(TRAILING '/' FROM li1.url) =
      TRIM(TRAILING '/' FROM li2.url)
AND li1.id<li2.id

另一个用于测试的SQLfiddle .

在运行网上发现的任意SQL之前，总是备份您的表，甚至是我的:)

Always back up your tables before running arbitrary SQL found on the net, even mine :)

如果您的数据库计算机受到限制，则您可能希望使用索引来进行操作，并避免将其不必要地加载到内存中；

If your database machine is limited, you may want to do it using indexes and avoid loading more into memory than necessary;

-- remove all trailing slashes
UPDATE link_info 
SET url=TRIM(TRAILING '/' FROM url);

-- create an index on the resulting strings (if there isn't already one)    
CREATE INDEX url_index ON link_info(url);

-- delete all duplicates
DELETE li1
FROM link_info li1
JOIN link_info li2
WHERE li1.url = li2.url
AND li1.id<li2.id;

-- drop the index if not needed anymore
DROP INDEX url_index ON link_info;

还有另一个 SQLfiddle .

这篇关于使用mySQL语句查找带有斜杠的几乎重复的数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用mySQL语句查找带有斜杠的几乎重复的数据 [英] mySQL statement to find nearly duplicate data with trailing slash

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

使用mySQL语句查找带有斜杠的几乎重复的数据 [英] mySQL statement to find nearly duplicate data with trailing slash

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭