MySQL从重复的条目清除表,并在依赖表中重新链接FK [英] MySQL cleanup table from duplicated entries AND relink FK in depending table

查看:95
本文介绍了MySQL从重复的条目清除表,并在依赖表中重新链接FK的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的情况:我有2张表,患者研究



每个表都有自己的PK使用自动增量。



在我的情况下,pat_id应该是唯一的。它不是在数据库级别被声明为唯一的,因为它可能不是唯一的一些用途(它不是一个自制的系统)。我发现如何配置系统来考虑pat_id是唯一的,但是我现在需要清除重复的患者的数据库,并将研究表中的重复患者重新链接到剩余的唯一患者,然后再删除重复的患者



患者表:

  CREATE TABLE`patient`(
`pk` BIGINT(20)NOT NULL AUTO_INCREMENT,
`pat_id` VARCHAR(250)COLLATE latin1_bin DEFAULT NULL,
...
`pat_name` VARCHAR(250)COLLATE latin1_bin DEFAULT NULL,
...
`pat_custom1` VARCHAR(250)COLLATE latin1_bin DEFAULT NULL
....
PRIMARY KEY(`pk`)
)ENGINE = InnoDB;

学习表:

  CREATE TABLE`study`(
`pk` BIGINT(20)NOT NULL AUTO_INCREMENT,
`patient_fk` BIGINT(20)DEFAULT NULL,
...
PRIMARY KEY(`pk`),
...
CONSTRAINT`patient_fk` FOREIGN KEY(`patient_fk`)参考`patient`(`pk`)
)ENGINE = InnoDB;

我发现了一些类似的问题,但不完全一样的问题,特别是缺少链接外号到剩下的唯一病人。



重复条目的清理更新



仅更新MySQL中重复条目的第一条记录

解决方案

是我如何做的。



我在患者表中重新使用了一个未使用的字段,以标记非重复(N),1st重复(X)和其他重复患者(Y)。您还可以为此添加列(并在使用后将其删除)。



以下是我遵循的清理数据库的步骤:

  / * 1:重复列表* / 
选择pk,pat_id,t.`pat_id_issuer`,t.`pat_name`,t.pat_custom1
来自病人t
其中pat_id in(
从(
中选择pat_id)从患者
组中选择pat_id,count(*)
1
count(*)> 1
)xxx);

/ * 2:删除孤儿患者* /
从pk不在的患者中删除(选择patient_fk来自学习);

/ * 3:复制(或不)患者的标记* /
更新病人t set t.`pat_custom1` ='N';

/ * 4:标记所有重复的* /
更新病人t set t.`pat_custom1` ='Y'
其中pat_id in(
select pat_id from
从患者
组中选择pat_id,count(*)
由1
(count)(*)> 1
)xxx);

/ * 5:取消标记重复的第一个* /
更新病人t
加入(从(
中选择pk)选择最小(pk)为pk,pat_id从患者
其中pat_custom1 ='Y'
group by pat_id
)xxx)x
on(x.pk = t.pk)
set t.`pat_custom1 `='X'
其中pat_custom1 ='Y'
;

/ * 6:验证更新正确* /
从`patient`
中选择pk,pat_id,pat_custom1
其中pat_custom1!='N'
按pat_id,pat_custom1;

/ * 7:验证与重复病人有关的研究* /
从研究s
中选择p。*加入患者p(p.pk = s.patient_fk)
其中p.pat_custom1 ='Y';

/ * 8:重新复制病人* /
更新学习s
加入病人p(p.pk = s.patient_fk)
set patient_fk =(select pk from patient pp
其中pp.pat_id = p.pat_id和pp.pat_custom1 ='X')
其中p.pat_custom1 ='Y';

/ * 9:删除新孤儿患者* /
从pk不在的患者中删除(选择patient_fk来自学习);

/ * 10:复位标志* /
更新病人t set t.`pat_custom1` = null;

/ * 11:提交更改* /
提交;

肯定有一个更短的方法,有一些更聪明(复杂的)SQL,但我个人更喜欢简单的方法。这也让我检查每一步都在做我期望的。


Here is my situation: I have 2 tables, patient and study.

Each table has its own PK using autoincrement.

In my case, the pat_id should be unique. It's not declared as unique at database level since it could be non unique is some uses (it's not a home made system). I found out how to configure the system to consider the pat_id as unique, but I need now to cleanup the database for duplicated patients AND relink duplicated patients in study table to remaining unique patient, before deleting the duplicated patients.

Patient table:

CREATE TABLE `patient` (
  `pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `pat_id` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
  `pat_name` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
  `pat_custom1` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL
....
  PRIMARY KEY (`pk`)
)ENGINE=InnoDB;

Study table:

CREATE TABLE `study` (
  `pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `patient_fk` BIGINT(20) DEFAULT NULL,
...
  PRIMARY KEY (`pk`),
...
  CONSTRAINT `patient_fk` FOREIGN KEY (`patient_fk`) REFERENCES `patient` (`pk`)
)ENGINE=InnoDB;

I found some similar questions, but not exactly the same issue, especially it was missing the link of the foreign keys to the remaining unique patient.

Cleanup Update for Duplicate Entries

Update only first record from duplicate entries in MySQL

解决方案

This is how I did.

I reused an unused field in patient table to mark non duplicated (N), 1st of duplicated (X), and other duplicated patients (Y). You could also add a column for this (and drop it after use).

Here are the steps I followed to cleanup my database:

/*1: List duplicated */
select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1
from patient t
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient 
group by 1
having count(*)>1
) xxx);    

/*2: Delete orphan patients */
delete from patient where pk not in (select patient_fk from study);

/*3: Reset flag for duplicated (or not) patients*/
update patient t set t.`pat_custom1`='N';

/*4: Mark all duplicated */
update patient t set t.`pat_custom1`='Y' 
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient 
group by 1
having count(*)>1
) xxx) ;

/*5: Unmark the 1st of the duplicated*/
update patient t 
join (select pk from (
select min(pk) as pk, pat_id from patient 
where  pat_custom1='Y'  
group by pat_id
) xxx ) x
on (x.pk=t.pk)
set t.`pat_custom1`='X' 
where  pat_custom1='Y'
  ;

/*6: Verify update is correct*/
select pk, pat_id,pat_custom1  
from `patient` 
where  pat_custom1!='N'
order by pat_id, pat_custom1;

/*7: Verify studies linked to duplicated patient */
select p.* from study s
join patient p on (p.pk=s.patient_fk)
where p.pat_custom1='Y';

/*8: Relink duplicated patients */
update study s
join patient p on (p.pk=s.patient_fk)
set patient_fk = (select pk from patient pp
where pp.pat_id=p.pat_id and pp.pat_custom1='X')
where p.pat_custom1='Y';

/*9: Delete newly orphan patients */
delete from patient where pk not in (select patient_fk from study);

/* 10: reset flag */
update patient t set t.`pat_custom1`=null;

/* 11: Commit changes */
commit;

There is certainly a shorter way, with a some smarter (complicated?) SQL, but I personally prefer the simple way. This also allows me to check each step is doing what I expect.

这篇关于MySQL从重复的条目清除表,并在依赖表中重新链接FK的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆