如何批量更新Postgres中5500万条记录的单列 [英] How to Update a single column in postgres in a batch for 55 Million records
本文介绍了如何批量更新Postgres中5500万条记录的单列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想更新postgres表的一列。记录大约有5,500万条,所以我们需要一批10000条记录来更新它。 注意:我们希望更新所有行。但我们不想锁定我们的表。
我正在尝试以下查询-
Update account set name = Some name where id between 1 and 10000
我们如何才能对每10000条记录更新一次循环?
如有任何建议和帮助,我们将不胜感激。
PostgreSQL 10.5
推荐答案
与其一次提交所有更改(或其他答案中建议的5,500万次),我更愿意尝试按您的建议将更新行拆分成小批,例如10k条记录。在PL/pgSQL中,可以使用关键字BY
以给定的步骤迭代集合。因此您可以在anonymous code block
中进行批量更新,如下所示:
PostgreSQL 11+
DO $$
DECLARE
page int := 10000;
min_id bigint; max_id bigint;
BEGIN
SELECT max(id),min(id) INTO max_id,min_id FROM account;
FOR j IN min_id..max_id BY page LOOP
UPDATE account SET name = 'your magic goes here'
WHERE id >= j AND id < j+page;
COMMIT;
END LOOP;
END; $$;
- 您可能需要调整
WHERE
子句以避免不必要的重叠。
测试
具有顺序ID的1051行数据示例:
CREATE TABLE account (id int, name text);
INSERT INTO account VALUES(generate_series(0,1050),'untouched record..');
正在执行匿名代码块...
DO $$
DECLARE
page int := 100;
min_id bigint; max_id bigint;
BEGIN
SELECT max(id),min(id) INTO max_id,min_id FROM account;
FOR j IN min_id..max_id BY page LOOP
UPDATE account SET name = now() ||' -> UPDATED ' || j || ' to ' || j+page
WHERE id >= j AND id < j+page;
RAISE INFO 'committing data from % to % at %', j,j+page,now();
COMMIT;
END LOOP;
END; $$;
INFO: committing data from 0 to 100 at 2021-04-14 17:35:42.059025+02
INFO: committing data from 100 to 200 at 2021-04-14 17:35:42.070274+02
INFO: committing data from 200 to 300 at 2021-04-14 17:35:42.07806+02
INFO: committing data from 300 to 400 at 2021-04-14 17:35:42.087201+02
INFO: committing data from 400 to 500 at 2021-04-14 17:35:42.096548+02
INFO: committing data from 500 to 600 at 2021-04-14 17:35:42.105876+02
INFO: committing data from 600 to 700 at 2021-04-14 17:35:42.114514+02
INFO: committing data from 700 to 800 at 2021-04-14 17:35:42.121946+02
INFO: committing data from 800 to 900 at 2021-04-14 17:35:42.12897+02
INFO: committing data from 900 to 1000 at 2021-04-14 17:35:42.134388+02
INFO: committing data from 1000 to 1100 at 2021-04-14 17:35:42.13951+02
..您可以成批更新您的行。为了证明我的观点,下面的查询按更新时间对记录进行分组:
SELECT DISTINCT ON (name) name, count(id)
FROM account
GROUP BY name ORDER BY name;
name | count
------------------------------------------------------+-------
2021-04-14 17:35:42.059025+02 -> UPDATED 0 to 100 | 100
2021-04-14 17:35:42.070274+02 -> UPDATED 100 to 200 | 100
2021-04-14 17:35:42.07806+02 -> UPDATED 200 to 300 | 100
2021-04-14 17:35:42.087201+02 -> UPDATED 300 to 400 | 100
2021-04-14 17:35:42.096548+02 -> UPDATED 400 to 500 | 100
2021-04-14 17:35:42.105876+02 -> UPDATED 500 to 600 | 100
2021-04-14 17:35:42.114514+02 -> UPDATED 600 to 700 | 100
2021-04-14 17:35:42.121946+02 -> UPDATED 700 to 800 | 100
2021-04-14 17:35:42.12897+02 -> UPDATED 800 to 900 | 100
2021-04-14 17:35:42.134388+02 -> UPDATED 900 to 1000 | 100
2021-04-14 17:35:42.13951+02 -> UPDATED 1000 to 1100 | 51
演示:db<>fiddle
这篇关于如何批量更新Postgres中5500万条记录的单列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文