如何加快对海量表的更新查询 [英] How to speed up update query on massive table

查看:70
本文介绍了如何加快对海量表的更新查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在将我们现有的数据库之一转移到新的本体上。数据库遵循星型模式,其中observation_fact是星的中心,concept_dimension是查找表。为了转移到新的本体中,我需要用稍有不同的代码替换observation_fact中的concept_cd,以匹配新本体中的concept_cds。

I am currently in the process of transferring one of our existing databases onto a new ontology. The database follows a star schema with observation_fact being the center of the star with concept_dimension being a lookup table. In order to transfer to the new ontology I need to replace the concept_cd in observation_fact with a slightly different code in order to match the concept_cds in the new ontology.

编写更新查询以完成此迁移,但是它已经运行了5天,我认为它不会很快完成。我已经在concept_cd上索引了两个相关的表。

I have tried writing an update query to accomplish this migration however it has been running for 5 days and I don't think it is going to finish anytime soon. I have indexed the two relevant tables on concept_cd.

这是我最初写的查询:

Update observation_fact ofact
Set concept_cd = q.cd
From (Select ofact2.ctid, Case 
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD10-CM'  Then replace(ofact2.concept_cd, 'ICD10-CM:', 'ICD10CM:')
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD10-PCS' Then replace(ofact2.concept_cd, 'ICD10-PCS:', 'ICD10PCS:')
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD9' And cdim.concept_path like '\\i2b2\\Diagnoses\\%'  Then replace(ofact2.concept_cd, 'ICD9:', 'ICD9CM:')
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD9' And cdim.concept_path like '\\i2b2\\Procedures\\%' Then replace(ofact2.concept_cd, 'ICD9:', 'ICD9PROC:')
  End as cd
  From observation_fact ofact2
  Left Outer Join concept_dimension_bak cdim
  On ofact2.concept_cd = cdim.concept_cd
) as q
Where ofact.ctid = q.ctid;

由于observation_fact没有真正的主键或复合键,所以写起来很尴尬使用ctid。我还两次使用了observation_fact,它从此答案中在大型表上加快Postgres更新我知道这是个坏主意,可能是问题的一部分。我之所以使用左外部联接,是因为concept_fact中的一些concept_cds在concept_dimension_bak中不存在。如您所见,ICD10的替换非常容易,但是对于ICD9,我需要在旧的concept_dimension表中查找代码,以便找出它是哪种类型的代码,并进行相应的替换。

It felt very awkward to write since observation_fact does not have a true primary key or composite key thus I had to use ctid. Also I used observation_fact twice which from this answer Speed up Postgres Update on Large Table I know that this is a bad idea and probably part of the problem. I used left outer join because some of the concept_cds in observation_fact do not exist in concept_dimension_bak. As you can see the ICD10 replacements are very easy however for ICD9 I need to lookup the code in the old concept_dimension table in order to figure out which type of code it is and replace it accordingly.

我希望此更新查询对case_statement匹配的observation_fact中的任何行执行适当的替换,并忽略其他所有内容。

I expect this update query to perform the appropriate replacement on any rows in observation_fact where the case statement matches and ignore everything else.

推荐答案

与其更新表,不如尝试使用SQL中具有的逻辑创建新表,它将更快。创建新表后,您可以重命名旧表并将新表重命名为 observation_fact

Instead of updating the table you should try to create a new table with the logic you have in the SQL, it will be faster. After the new table created you can rename the old table and rename the new table as observation_fact

要重申
1.创建新表

To reiterate 1. Create new table

insert into observation_fact_new 
select ...
from observation_fact




  1. 重命名旧表以进行完整性检查



alter table observation_fact rename to observation_fact_old




  1. 将新表重命名为observation_fact



alter table observation_fact_new rename to observation_fact

检查后并完成测试,删除旧表

After your checks and tests are done drop the old table

删除表observation_fact_old

这篇关于如何加快对海量表的更新查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆