使用JOIN(大表性能)PostgreSQL更新? [英] UPDATE FROM with a JOIN (Large Table Performance) Postgresql?
问题描述
我正在尝试使以下查询以合理的性能执行:
I'm trying to get the following query to execute at a reasonable performance:
UPDATE order_item_imprint SET item_new_id = oi.item_new_id
FROM order_item oi
INNER JOIN order_item_imprint oii ON oi.item_number = oii.item_id
当前,它在8天内无法完成,因此我们将其杀死.查询说明如下:
Currently, it doesn't complete within 8 days so we killed it. The query explaination is as follows:
Merge Join (cost=59038021.60..33137238641.84 rows=1432184234121 width=1392)
Merge Cond: ((oi.item_number)::text = (oii.item_id)::text)
-> Nested Loop (cost=0.00..10995925524.15 rows=309949417305 width=1398)
-> Index Scan using unique_order_item_item_number on order_item oi (cost=0.00..608773.05 rows=258995 width=14)
-> Seq Scan on order_item_imprint (cost=0.00..30486.39 rows=1196739 width=1384)
-> Materialize (cost=184026.24..198985.48 rows=1196739 width=6)
-> Sort (cost=184026.24..187018.09 rows=1196739 width=6)
Sort Key: oii.item_id
-> Seq Scan on order_item_imprint oii (cost=0.00..30486.39 rows=1196739 width=6)
我在两个表上都有索引,并且我确保比较字段的类型和大小相同.我现在正试图更改postgresql服务器配置以希望有所帮助,但我不确定这样做.
I have indexes on both tables, and i've ensured the comparing fields are of identical type and size. I am now at the point of trying to change the postgresql server configuration to hopefully help, but I am not sure it will.
order_item_imprint表的大小约为110万,磁盘占用空间为145MB,order_item表的大小约为3rd.
The order_item_imprint table is about 1.1 million in size with a 145MB disk footprint, and the order_item table is about a 3rd the size.
主要目标是我需要能够在几个小时的维护时段内将其与其他几个查询一起运行.
The main goal is i need to be able to run this along with several other queries during a few hour maintenance window.
在执行计划之前已经进行了自动吸尘和分析.
Auto vacuum and analyze has been run prior to execution plan.
推荐答案
我找到了另一种书写方式 允许pgsql优化器的查询 建立更多查询 有效地
I found an alternate way to write the query that allowed the pgsql optimizer to build the query much more efficiently
实际上,您所做的是删除了order_item_inprint上不受约束的自联接.
Actually, what you did was remove the unconstrained self-join on order_item_inprint.
如果您查看第一行,则会看到以下行估算值:
If you look at the first line you'll see the following row estimate:
行= 1432184234121
rows=1432184234121
这是它正在尝试进行的14亿次更新.在联接中为order_item_inprint加上别名时,它将被视为与更新目标分开的表.
That's 1.4 billion updates it is trying to do. When you aliased order_item_inprint in the join it got treated as a separate table from the update target.
这篇关于使用JOIN(大表性能)PostgreSQL更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!