从MySql表中删除重复的行 [英] Deleting Duplicate Rows from MySql Table

查看:127
本文介绍了从MySql表中删除重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本可以在MySql表中查找重复的行,该表包含40,000,000行.但这很慢,是否有一种更简单的方法来查找重复记录而无需进出php?

这是我当前使用的脚本

 $find = mysql_query("SELECT * FROM pst_nw ID < '1000'");
        while ($row = mysql_fetch_assoc($find))
        {
            $find_1 = mysql_query("SELECT * FROM pst_nw add1 = '$row[add1]' AND add2 = '$row[add2]' AND add3 = '$row[add3]' AND add4 = '$row[add4]'");
                if (mysql_num_rows($find_1) > 0) {
                                                    mysql_query("DELETE FROM pst_nw WHERE ID ='$row[ID]'}

         }

解决方案

您有很多选择.

让数据库完成工作

使用唯一索引创建表的副本-然后从源表中将数据插入其中:

CREATE TABLE clean LIKE pst_nw;
ALTER IGNORE TABLE clean ADD UNIQUE INDEX (add1, add2, add3, add4);
INSERT IGNORE INTO clean SELECT * FROM pst_nw;
DROP TABLE pst_nw;
RENAME TABLE clean pst_nw;

以这种方式进行操作的优点是,可以在删除源表之前验证新表是否正确.缺点是它占用的空间是原来的两倍,并且执行起来相对较慢.

让数据库执行#2

您还可以通过执行以下操作来获得所需的结果:

set session old_alter_table=1;
ALTER IGNORE TABLE pst_nw ADD UNIQUE INDEX (add1, add2, add3, add4);

对于忽略标记为..被忽略

这里的优点是不会有关于临时表的麻烦-缺点是您在运行更新之前不需要检查更新是否完全符合您的期望.

示例:

 CREATE TABLE `foo` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `one` int(10) DEFAULT NULL,
  `two` int(10) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);

select * from foo;
+----+------+------+
| id | one  | two  |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    1 |    1 |
|  3 |    1 |    1 |
+----+------+------+
3 row in set (0.00 sec)

set session old_alter_table=1;
ALTER IGNORE TABLE foo ADD UNIQUE INDEX (one, two);

select * from foo;
+----+------+------+
| id | one  | two  |
+----+------+------+
|  1 |    1 |    1 |
+----+------+------+
1 row in set (0.00 sec)

不要在数据库外做这种事情

尤其是在数据库外有4000万行在数据库中执行类似的操作时,可能会花费大量时间,并且可能根本无法完成.保留在数据库中的任何解决方案都将更快,更健壮.

I have a script to find duplicate rows in my MySql table, the table contains 40,000,000 rows. but it is very slow going, is there an easier way to find the duplicate records without going in and out of php?

This is the script i currently use

 $find = mysql_query("SELECT * FROM pst_nw ID < '1000'");
        while ($row = mysql_fetch_assoc($find))
        {
            $find_1 = mysql_query("SELECT * FROM pst_nw add1 = '$row[add1]' AND add2 = '$row[add2]' AND add3 = '$row[add3]' AND add4 = '$row[add4]'");
                if (mysql_num_rows($find_1) > 0) {
                                                    mysql_query("DELETE FROM pst_nw WHERE ID ='$row[ID]'}

         }

解决方案

You have a number of options.

Let the DB do the work

Create a copy of your table with a unique index - and then insert the data into it from your source table:

CREATE TABLE clean LIKE pst_nw;
ALTER IGNORE TABLE clean ADD UNIQUE INDEX (add1, add2, add3, add4);
INSERT IGNORE INTO clean SELECT * FROM pst_nw;
DROP TABLE pst_nw;
RENAME TABLE clean pst_nw;

The advantage of doing things this way is you can verify that your new table is correct before dropping your source table. The disadvantage is it takes up twice as much space and is (relatively) slow to execute.

Let the DB do the work #2

You can also achieve the result you want by doing:

set session old_alter_table=1;
ALTER IGNORE TABLE pst_nw ADD UNIQUE INDEX (add1, add2, add3, add4);

The first command is required as a workaround for the ignore flag being .. ignored

The advantage here is there's no messing about with a temporary table - the disadvantage is you don't get to check that your update does exactly what you expect before you run it.

Example:

 CREATE TABLE `foo` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `one` int(10) DEFAULT NULL,
  `two` int(10) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);

select * from foo;
+----+------+------+
| id | one  | two  |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    1 |    1 |
|  3 |    1 |    1 |
+----+------+------+
3 row in set (0.00 sec)

set session old_alter_table=1;
ALTER IGNORE TABLE foo ADD UNIQUE INDEX (one, two);

select * from foo;
+----+------+------+
| id | one  | two  |
+----+------+------+
|  1 |    1 |    1 |
+----+------+------+
1 row in set (0.00 sec)

Don't do this kind of thing outside the DB

Especially with 40 million rows doing something like this outside the db is likely to take a huge amount of time, and may not complete at all. Any solution that stays in the db will be faster, and more robust.

这篇关于从MySql表中删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆