在 Postgres 9.1 上更新查询太慢 [英] Update query too slow on Postgres 9.1

查看:44
本文介绍了在 Postgres 9.1 上更新查询太慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是我对一个有 1400 万行的表的更新查询非常慢.我尝试了不同的方法来调整我的服务器,这带来了良好的性能,但不适用于更新查询.

我有两张桌子:

  • T1 上有 4 列和 3 个索引(530 行)
  • T2 有 15 列和 3 个索引(1400 万行)
  • 我想通过在文本字段 stxt 上连接两个表,将 T2 中的字段 vid(整数类型)更新为 T1 中 vid 的相同值.

这是我的查询及其输出:

解释分析更新 T2设置 vid=T1.vid从 T1其中 stxt2 ~ stxt1 和 T2.vid = 0;

<前>T2 更新(成本=0.00..9037530.59 行=2814247 宽度=131)(实际时间=25141785.741..25141785.741 行=0 循环=1)-> 嵌套循环(成本=0.00..9037530.59 行=2814247 宽度=131)(实际时间=32.636..25035782.995 行=679354 循环=1)加入过滤器:((T2.stxt2)::text ~ (T1.stxt1)::text)-> T2 上的 Seq 扫描(成本 = 0.00..594772.96 行 = 1061980 宽度 = 121)(实际时间 = 0.067..5402.614 行 = 1037809 循环 = 1)过滤器:(视频= 1)-> 实现(成本=0.00..17.95 行=530 宽度=34)(实际时间=0.000..0.069 行=530 循环=1037809)-> T1 上的 Seq Scan (cost=0.00..15.30 rows=530 width=34) (实际时间=0.019..0.397 rows=530 loops=1)总运行时间:25141785.904 毫秒

如您所见,查询耗时约 25141 秒(约 7 小时).f 我理解得很好,计划器估计执行时间为 9037 秒(~2.5 小时).我在这里遗漏了什么吗?

以下是有关我的服务器配置的信息:

  • CentOS 5.8,20GB 内存
  • shared_buffers = 12GB
  • work_mem = 64MB
  • maintenance_work_mem = 64MB
  • bgwriter_lru_maxpages = 500
  • checkpoint_segments = 64
  • checkpoint_completion_target = 0.9
  • effective_cache_size = 10GB

我已经在表 T2 上运行了完整的真空并进行了多次分析,但这仍然无法改善情况.

PS:如果我将 full_page_writes 设置为 off,这会大大改善更新查询,但我不想冒数据丢失的风险.请问有什么建议吗?

解决方案

这不是解决方案,而是数据建模的解决方法

  • 将 url 分解为 {protocol,hostname,pathname} 组件.
  • 现在您可以使用完全匹配来加入 om 主机名部分,避免正则表达式匹配中的前导 %.
  • 该视图旨在证明如果需要,可以重建 full_url.

更新可能需要几分钟.

SET search_path='tmp';删除表 url 级联;创建表格网址( id SERIAL NOT NULL 主键, full_url varchar, 原始 varchar, 主机名 varchar, 路径名 varchar);INSERT INTO urls(full_url) VALUES('ftp://www.myhost.com/secret.tgz'),( 'http://www.myhost.com/robots.txt' ),( 'http://www.myhost.com/index.php' ),( 'https://www.myhost.com/index.php' ),( 'http://www.myhost.com/subdir/index.php' ),( 'https://www.myhost.com/subdir/index.php' ),( 'http://www.hishost.com/index.php' ),( 'https://www.hishost.com/index.php' ),( 'http://www.herhost.com/index.php' ),( 'https://www.herhost.com/index.php' );更新网址SET proto = split_part(full_url, '://' , 1), 主机名 = split_part(full_url, '://' , 2);更新网址SET pathname = substr(hostname, 1+strpos(hostname, '/' )), 主机名 = split_part(主机名, '/' , 1);-- full_url 字段现在是多余的:我们可以删除它更改表网址删除列 full_url;-- 我们总是可以从它的组件中重建 full_url .创建视图 vurl 为 (选择 ID, 原型 ||'://' ||主机名 ||'/' ||路径名 AS full_url, 原型, 主机名, 路径名来自网址);SELECT * FROM urls;;SELECT * FROM vurls;;

输出:

INSERT 0 10更新 10更新 10更改表创建视图身份证 |原型 |主机名 |路径名----+-------+------------------+-----1 |ftp |www.myhost.com |秘密.tgz2 |http |www.myhost.com |机器人.txt3 |http |www.myhost.com |索引.php4 |https |www.myhost.com |索引.php5 |http |www.myhost.com |子目录/index.php6 |https |www.myhost.com |子目录/index.php7 |http |www.hishost.com |索引.php8 |https |www.hishost.com |索引.php9 |http |www.herhost.com |索引.php10 |https |www.herhost.com |索引.php(10 行)身份证 |full_url |原型 |主机名 |路径名----+------------------------------------------+-------+------------------+-----1 |ftp://www.myhost.com/secret.tgz |ftp |www.myhost.com |秘密.tgz2 |http://www.myhost.com/robots.txt |http |www.myhost.com |机器人.txt3 |http://www.myhost.com/index.php |http |www.myhost.com |索引.php4 |https://www.myhost.com/index.php |https |www.myhost.com |索引.php5 |http://www.myhost.com/subdir/index.php |http |www.myhost.com |子目录/index.php6 |https://www.myhost.com/subdir/index.php |https |www.myhost.com |子目录/index.php7 |http://www.hishost.com/index.php |http |www.hishost.com |索引.php8 |https://www.hishost.com/index.php |https |www.hishost.com |索引.php9 |http://www.herhost.com/index.php |http |www.herhost.com |索引.php10 |https://www.herhost.com/index.php |https |www.herhost.com |索引.php(10 行)

My problem is that I have a very slow update query on a table with 14 million rows. I tried different things to tune my server which brought good performance but not for update queries.

I have two tables:

  • T1 with 4 columns and 3 indexes on it (530 rows)
  • T2 with 15 columns and 3 indexes on it (14 millions rows)
  • I want to update the field vid (type integer) in T2 by the same value of vid in T1 by joining the two tables on a text field stxt.

Here is my query and its output:

explain analyse 
update T2 
  set vid=T1.vid 
from T1 
where stxt2 ~ stxt1 and T2.vid = 0;

Update on T2  (cost=0.00..9037530.59 rows=2814247 width=131) (actual time=25141785.741..25141785.741 rows=0 loops=1)
 ->  Nested Loop  (cost=0.00..9037530.59 rows=2814247 width=131) (actual time=32.636..25035782.995 rows=679354 loops=1)
             Join Filter: ((T2.stxt2)::text ~ (T1.stxt1)::text)
             ->  Seq Scan on T2  (cost=0.00..594772.96 rows=1061980 width=121) (actual time=0.067..5402.614 rows=1037809 loops=1)
                         Filter: (vid= 1)
             ->  Materialize  (cost=0.00..17.95 rows=530 width=34) (actual time=0.000..0.069 rows=530 loops=1037809)
                         ->  Seq Scan on T1  (cost=0.00..15.30 rows=530 width=34) (actual time=0.019..0.397 rows=530 loops=1)
Total runtime: 25141785.904 ms

As you can see the query took approximately 25141 seconds (~ 7 hours). f I understood well, the planner estimates the execution time to be 9037 seconds (~ 2.5 hours). Am I missing something here?

Here are information about my server config:

  • CentOS 5.8, 20GB of RAM
  • shared_buffers = 12GB
  • work_mem = 64MB
  • maintenance_work_mem = 64MB
  • bgwriter_lru_maxpages = 500
  • checkpoint_segments = 64
  • checkpoint_completion_target = 0.9
  • effective_cache_size = 10GB

I have run vacuum full and analyse several times on table T2 but this still does not improve much the situation.

PS: if I set full_page_writes to off, this improves considerably update queries, but I don't want to risk data loss. Do you please have any recommandations?

解决方案

This is not a solution, but a data-modelling work-around

  • break up the urls into {protocol,hostname,pathname} components.
  • Now you can use exact matches to join om the hostname part, avoiding the leading % in the regex-match.
  • the view is intended to demonstrate that the full_url can be reconstructed if needed.

The update could probably take a few minutes.

SET search_path='tmp';

DROP TABLE urls CASCADE;
CREATE TABLE urls
        ( id SERIAL NOT NULL PRIMARY KEY
        , full_url varchar
        , proto varchar
        , hostname varchar
        , pathname varchar
        );

INSERT INTO urls(full_url) VALUES
 ( 'ftp://www.myhost.com/secret.tgz' )
,( 'http://www.myhost.com/robots.txt' )
,( 'http://www.myhost.com/index.php' )
,( 'https://www.myhost.com/index.php' )
,( 'http://www.myhost.com/subdir/index.php' )
,( 'https://www.myhost.com/subdir/index.php' )
,( 'http://www.hishost.com/index.php' )
,( 'https://www.hishost.com/index.php' )
,( 'http://www.herhost.com/index.php' )
,( 'https://www.herhost.com/index.php' )
        ;

UPDATE urls
SET proto = split_part(full_url, '://' , 1)
        , hostname = split_part(full_url, '://' , 2)
        ;

UPDATE urls
SET pathname = substr(hostname, 1+strpos(hostname, '/' ))
        , hostname = split_part(hostname, '/' , 1)
        ;

        -- the full_url field is now redundant: we can drop it
ALTER TABLE urls
        DROP column full_url
        ;
        -- and we could always reconstruct the full_url from its components.
CREATE VIEW vurls AS (
        SELECT id
        , proto || '://' || hostname || '/' || pathname AS full_url
        , proto
        , hostname
        , pathname
        FROM urls
        );

SELECT * FROM urls;
        ;
SELECT * FROM vurls;
        ;

OUTPUT:

INSERT 0 10
UPDATE 10
UPDATE 10
ALTER TABLE
CREATE VIEW
 id | proto |    hostname     |     pathname     
----+-------+-----------------+------------------
  1 | ftp   | www.myhost.com  | secret.tgz
  2 | http  | www.myhost.com  | robots.txt
  3 | http  | www.myhost.com  | index.php
  4 | https | www.myhost.com  | index.php
  5 | http  | www.myhost.com  | subdir/index.php
  6 | https | www.myhost.com  | subdir/index.php
  7 | http  | www.hishost.com | index.php
  8 | https | www.hishost.com | index.php
  9 | http  | www.herhost.com | index.php
 10 | https | www.herhost.com | index.php
(10 rows)

 id |                full_url                 | proto |    hostname     |     pathname     
----+-----------------------------------------+-------+-----------------+------------------
  1 | ftp://www.myhost.com/secret.tgz         | ftp   | www.myhost.com  | secret.tgz
  2 | http://www.myhost.com/robots.txt        | http  | www.myhost.com  | robots.txt
  3 | http://www.myhost.com/index.php         | http  | www.myhost.com  | index.php
  4 | https://www.myhost.com/index.php        | https | www.myhost.com  | index.php
  5 | http://www.myhost.com/subdir/index.php  | http  | www.myhost.com  | subdir/index.php
  6 | https://www.myhost.com/subdir/index.php | https | www.myhost.com  | subdir/index.php
  7 | http://www.hishost.com/index.php        | http  | www.hishost.com | index.php
  8 | https://www.hishost.com/index.php       | https | www.hishost.com | index.php
  9 | http://www.herhost.com/index.php        | http  | www.herhost.com | index.php
 10 | https://www.herhost.com/index.php       | https | www.herhost.com | index.php
(10 rows)

这篇关于在 Postgres 9.1 上更新查询太慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆