根据mysql中的最低ID删除重复的电子邮件地址 [英] Removing duplicate email address based on the lowest id in mysql

查看:129
本文介绍了根据mysql中的最低ID删除重复的电子邮件地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为emaildata的表,包含4列emailaddress,domainname,data和id。



emailaddress列应该只包含唯一的条目,但有很多重复。域名和数据列不是唯一的,因此将包含重复的内容。 id列设置为自动增量,所以只会包含唯一的值。



我的问题是如何摆脱所有具有重复电子邮件地址的行,最低的ID?



应该有大约37万行,但目前我已经有906,000。



我有一个SQL语句,之前我用于一个类似的表,我试图适应这个没有成功。

 删除T1 
来自emaildata T1,emaildata T2
其中T1.emailaddress = T2.emailaddress
和T1.id> T2.id

以上是基于以下内容应用于另一个表,工作正常。 / p>

 删除T1 
from email_list_subscribers T1,email_list_subscribers T2
其中T1.emailaddress = T2.emailaddress
和T1.subscriberid>我已经尝试在phpmyadmin中的远程服务器上的表上执行此操作,然后按下GO按钮,加载栏在中间出现,然后消失,就像处理一样 - 但它从来没有。



我已经尝试重复这一点桌面在我的家庭服务器上运行(XAMPP)通过phpmyadmin,再次与HeidiSQL - phpmyadmin和Heidi的同样的问题似乎崩溃。



我尝试过其他独奏我在这里看到,但我似乎得到了相同的超时/崩溃问题。我从来没有在远程服务器上运行的原始语句出现问题,这是针对数据库的三分之一。



任何信息将不胜感激。 >

解决方案

您的查询似乎是正确的。你的问题似乎是一个性能问题,而不是一个逻辑问题。您需要确保您的 emailaddress id 字段在数据库中被正确地编入索引 - 否则关闭到百万行,我希望你的查询挂起。



(我猜想 id 可能已经索引,但不是 emailaddress 。特别是在表之间进行连接时,如果这两个字段中的任何一个没有编入索引,那么您将要查看一个完整的表扫描。)



编辑:



看到你的评论,这是这种情况,你可以按照文档 http://dev.mysql.com/doc/refman/5.0 /en/create-index.html ,用于创建索引。所以如下:

  CREATE INDEX email_index ON emaildata(emailaddress)使用BTREE; 


I have a table called emaildata consisting of 4 columns emailaddress, domainname, data and id.

The emailaddress column should contain only unique entries, but there are many duplicates. The domainname and data column are not unique, and as such will contain duplicates which is fine. The id column is set to autoincrement so will contain only unique values.

My question is how do I get rid of all rows that feature duplicate email addresses, keeping the one with the lowest id?

There should be around 370,000 rows, but currently I've got 906,000.

I had an SQL statement before which I used for a similar table and I've tried to adapt it to this without success.

delete T1
from emaildata T1, emaildata T2
where T1.emailaddress = T2.emailaddress
and T1.id > T2.id

The above was based upon the following which was applied to another table and worked fine.

delete T1
from email_list_subscribers T1, email_list_subscribers T2
where T1.emailaddress = T2.emailaddress
and T1.subscriberid > T2.subscriberid

I've tried running this against my table on the remote server in phpmyadmin and after pressing the GO button, the loading bar comes up in the middle, then disappears as if it is processing - but it never does.

I've tried repeating this against the same table running on my home server (XAMPP) via phpmyadmin, and again with HeidiSQL - the same problem with phpmyadmin and Heidi appears to crash.

I've tried other soloutions that i've seen on here but I seem to be getting the same "timeout" / crash problem. I never had issues with the original statement running on the remote server, granted this was against a database a third of the size.

Any info would be appreciated.

解决方案

Your query appears to be correct. Your issue seems to be a performance issue, not a logic issue. You'll need to make sure that both your emailaddress and id fields are properly indexed in the database - otherwise with close to a million rows, I would expect your query to hang.

(I would guess that id is probably already indexed, but not emailaddress. Especially with doing a join between tables, if either one of these fields is not indexed, you're going to be looking at a LOT of full table scans.)

Edit:

Seeing your comment that this is the case, you can follow the documentation at http://dev.mysql.com/doc/refman/5.0/en/create-index.html for creating indices. So something like:

CREATE INDEX email_index ON emaildata(emailaddress) USING BTREE;

这篇关于根据mysql中的最低ID删除重复的电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆