在 Postgres 9.1 上更新查询太慢 [英] Update query too slow on Postgres 9.1
问题描述
我的问题是我对一个有 1400 万行的表的更新查询非常慢.我尝试了不同的方法来调整我的服务器,这带来了良好的性能,但不适用于更新查询.
我有两张桌子:
- T1 上有 4 列和 3 个索引(530 行)
- T2 有 15 列和 3 个索引(1400 万行)
- 我想通过在文本字段 stxt 上连接两个表,将 T2 中的字段 vid(整数类型)更新为 T1 中 vid 的相同值.
这是我的查询及其输出:
解释分析更新 T2设置 vid=T1.vid从 T1其中 stxt2 ~ stxt1 和 T2.vid = 0;
<前>T2 更新(成本=0.00..9037530.59 行=2814247 宽度=131)(实际时间=25141785.741..25141785.741 行=0 循环=1)-> 嵌套循环(成本=0.00..9037530.59 行=2814247 宽度=131)(实际时间=32.636..25035782.995 行=679354 循环=1)加入过滤器:((T2.stxt2)::text ~ (T1.stxt1)::text)-> T2 上的 Seq 扫描(成本 = 0.00..594772.96 行 = 1061980 宽度 = 121)(实际时间 = 0.067..5402.614 行 = 1037809 循环 = 1)过滤器:(视频= 1)-> 实现(成本=0.00..17.95 行=530 宽度=34)(实际时间=0.000..0.069 行=530 循环=1037809)-> T1 上的 Seq Scan (cost=0.00..15.30 rows=530 width=34) (实际时间=0.019..0.397 rows=530 loops=1)总运行时间:25141785.904 毫秒
如您所见,查询耗时约 25141 秒(约 7 小时).f 我理解得很好,计划器估计执行时间为 9037 秒(~2.5 小时).我在这里遗漏了什么吗?
以下是有关我的服务器配置的信息:
- CentOS 5.8,20GB 内存
- shared_buffers = 12GB
- work_mem = 64MB
- maintenance_work_mem = 64MB
- bgwriter_lru_maxpages = 500
- checkpoint_segments = 64
- checkpoint_completion_target = 0.9
- effective_cache_size = 10GB
我已经在表 T2 上运行了完整的真空并进行了多次分析,但这仍然无法改善情况.
PS:如果我将 full_page_writes 设置为 off,这会大大改善更新查询,但我不想冒数据丢失的风险.请问有什么建议吗?
这不是解决方案,而是数据建模的解决方法
- 将 url 分解为 {protocol,hostname,pathname} 组件.
- 现在您可以使用完全匹配来加入 om 主机名部分,避免正则表达式匹配中的前导 %.
- 该视图旨在证明如果需要,可以重建 full_url.
更新可能需要几分钟.
SET search_path='tmp';删除表 url 级联;创建表格网址( id SERIAL NOT NULL 主键, full_url varchar, 原始 varchar, 主机名 varchar, 路径名 varchar);INSERT INTO urls(full_url) VALUES('ftp://www.myhost.com/secret.tgz'),( 'http://www.myhost.com/robots.txt' ),( 'http://www.myhost.com/index.php' ),( 'https://www.myhost.com/index.php' ),( 'http://www.myhost.com/subdir/index.php' ),( 'https://www.myhost.com/subdir/index.php' ),( 'http://www.hishost.com/index.php' ),( 'https://www.hishost.com/index.php' ),( 'http://www.herhost.com/index.php' ),( 'https://www.herhost.com/index.php' );更新网址SET proto = split_part(full_url, '://' , 1), 主机名 = split_part(full_url, '://' , 2);更新网址SET pathname = substr(hostname, 1+strpos(hostname, '/' )), 主机名 = split_part(主机名, '/' , 1);-- full_url 字段现在是多余的:我们可以删除它更改表网址删除列 full_url;-- 我们总是可以从它的组件中重建 full_url .创建视图 vurl 为 (选择 ID, 原型 ||'://' ||主机名 ||'/' ||路径名 AS full_url, 原型, 主机名, 路径名来自网址);SELECT * FROM urls;;SELECT * FROM vurls;;
输出:
INSERT 0 10更新 10更新 10更改表创建视图身份证 |原型 |主机名 |路径名----+-------+------------------+-----1 |ftp |www.myhost.com |秘密.tgz2 |http |www.myhost.com |机器人.txt3 |http |www.myhost.com |索引.php4 |https |www.myhost.com |索引.php5 |http |www.myhost.com |子目录/index.php6 |https |www.myhost.com |子目录/index.php7 |http |www.hishost.com |索引.php8 |https |www.hishost.com |索引.php9 |http |www.herhost.com |索引.php10 |https |www.herhost.com |索引.php(10 行)身份证 |full_url |原型 |主机名 |路径名----+------------------------------------------+-------+------------------+-----1 |ftp://www.myhost.com/secret.tgz |ftp |www.myhost.com |秘密.tgz2 |http://www.myhost.com/robots.txt |http |www.myhost.com |机器人.txt3 |http://www.myhost.com/index.php |http |www.myhost.com |索引.php4 |https://www.myhost.com/index.php |https |www.myhost.com |索引.php5 |http://www.myhost.com/subdir/index.php |http |www.myhost.com |子目录/index.php6 |https://www.myhost.com/subdir/index.php |https |www.myhost.com |子目录/index.php7 |http://www.hishost.com/index.php |http |www.hishost.com |索引.php8 |https://www.hishost.com/index.php |https |www.hishost.com |索引.php9 |http://www.herhost.com/index.php |http |www.herhost.com |索引.php10 |https://www.herhost.com/index.php |https |www.herhost.com |索引.php(10 行)
My problem is that I have a very slow update query on a table with 14 million rows. I tried different things to tune my server which brought good performance but not for update queries.
I have two tables:
- T1 with 4 columns and 3 indexes on it (530 rows)
- T2 with 15 columns and 3 indexes on it (14 millions rows)
- I want to update the field vid (type integer) in T2 by the same value of vid in T1 by joining the two tables on a text field stxt.
Here is my query and its output:
explain analyse
update T2
set vid=T1.vid
from T1
where stxt2 ~ stxt1 and T2.vid = 0;
Update on T2 (cost=0.00..9037530.59 rows=2814247 width=131) (actual time=25141785.741..25141785.741 rows=0 loops=1) -> Nested Loop (cost=0.00..9037530.59 rows=2814247 width=131) (actual time=32.636..25035782.995 rows=679354 loops=1) Join Filter: ((T2.stxt2)::text ~ (T1.stxt1)::text) -> Seq Scan on T2 (cost=0.00..594772.96 rows=1061980 width=121) (actual time=0.067..5402.614 rows=1037809 loops=1) Filter: (vid= 1) -> Materialize (cost=0.00..17.95 rows=530 width=34) (actual time=0.000..0.069 rows=530 loops=1037809) -> Seq Scan on T1 (cost=0.00..15.30 rows=530 width=34) (actual time=0.019..0.397 rows=530 loops=1) Total runtime: 25141785.904 ms
As you can see the query took approximately 25141 seconds (~ 7 hours). f I understood well, the planner estimates the execution time to be 9037 seconds (~ 2.5 hours). Am I missing something here?
Here are information about my server config:
- CentOS 5.8, 20GB of RAM
- shared_buffers = 12GB
- work_mem = 64MB
- maintenance_work_mem = 64MB
- bgwriter_lru_maxpages = 500
- checkpoint_segments = 64
- checkpoint_completion_target = 0.9
- effective_cache_size = 10GB
I have run vacuum full and analyse several times on table T2 but this still does not improve much the situation.
PS: if I set full_page_writes to off, this improves considerably update queries, but I don't want to risk data loss. Do you please have any recommandations?
This is not a solution, but a data-modelling work-around
- break up the urls into {protocol,hostname,pathname} components.
- Now you can use exact matches to join om the hostname part, avoiding the leading % in the regex-match.
- the view is intended to demonstrate that the full_url can be reconstructed if needed.
The update could probably take a few minutes.
SET search_path='tmp';
DROP TABLE urls CASCADE;
CREATE TABLE urls
( id SERIAL NOT NULL PRIMARY KEY
, full_url varchar
, proto varchar
, hostname varchar
, pathname varchar
);
INSERT INTO urls(full_url) VALUES
( 'ftp://www.myhost.com/secret.tgz' )
,( 'http://www.myhost.com/robots.txt' )
,( 'http://www.myhost.com/index.php' )
,( 'https://www.myhost.com/index.php' )
,( 'http://www.myhost.com/subdir/index.php' )
,( 'https://www.myhost.com/subdir/index.php' )
,( 'http://www.hishost.com/index.php' )
,( 'https://www.hishost.com/index.php' )
,( 'http://www.herhost.com/index.php' )
,( 'https://www.herhost.com/index.php' )
;
UPDATE urls
SET proto = split_part(full_url, '://' , 1)
, hostname = split_part(full_url, '://' , 2)
;
UPDATE urls
SET pathname = substr(hostname, 1+strpos(hostname, '/' ))
, hostname = split_part(hostname, '/' , 1)
;
-- the full_url field is now redundant: we can drop it
ALTER TABLE urls
DROP column full_url
;
-- and we could always reconstruct the full_url from its components.
CREATE VIEW vurls AS (
SELECT id
, proto || '://' || hostname || '/' || pathname AS full_url
, proto
, hostname
, pathname
FROM urls
);
SELECT * FROM urls;
;
SELECT * FROM vurls;
;
OUTPUT:
INSERT 0 10
UPDATE 10
UPDATE 10
ALTER TABLE
CREATE VIEW
id | proto | hostname | pathname
----+-------+-----------------+------------------
1 | ftp | www.myhost.com | secret.tgz
2 | http | www.myhost.com | robots.txt
3 | http | www.myhost.com | index.php
4 | https | www.myhost.com | index.php
5 | http | www.myhost.com | subdir/index.php
6 | https | www.myhost.com | subdir/index.php
7 | http | www.hishost.com | index.php
8 | https | www.hishost.com | index.php
9 | http | www.herhost.com | index.php
10 | https | www.herhost.com | index.php
(10 rows)
id | full_url | proto | hostname | pathname
----+-----------------------------------------+-------+-----------------+------------------
1 | ftp://www.myhost.com/secret.tgz | ftp | www.myhost.com | secret.tgz
2 | http://www.myhost.com/robots.txt | http | www.myhost.com | robots.txt
3 | http://www.myhost.com/index.php | http | www.myhost.com | index.php
4 | https://www.myhost.com/index.php | https | www.myhost.com | index.php
5 | http://www.myhost.com/subdir/index.php | http | www.myhost.com | subdir/index.php
6 | https://www.myhost.com/subdir/index.php | https | www.myhost.com | subdir/index.php
7 | http://www.hishost.com/index.php | http | www.hishost.com | index.php
8 | https://www.hishost.com/index.php | https | www.hishost.com | index.php
9 | http://www.herhost.com/index.php | http | www.herhost.com | index.php
10 | https://www.herhost.com/index.php | https | www.herhost.com | index.php
(10 rows)
这篇关于在 Postgres 9.1 上更新查询太慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!