PostgreSQL截断速度 [英] Postgresql Truncation speed

查看:85
本文介绍了PostgreSQL截断速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用Postgresql 9.1.4作为我们的数据库服务器.我一直在尝试加快测试套件的速度,因此我一直在盯着数据库进行性能分析,以了解到底发生了什么.我们正在使用 database_cleaner 在测试结束时截断表.是的,我知道交易速度更快,在某些情况下我无法使用它们,所以我对此并不担心.

We're using Postgresql 9.1.4 as our db server. I've been trying to speed up my test suite so I've stared profiling the db a bit to see exactly what's going on. We are using database_cleaner to truncate tables at the end of tests. YES I know transactions are faster, I can't use them in certain circumstances so I'm not concerned with that.

我关心的是,为什么TRUNCATION花费这么长的时间(比使用DELETE更长),为什么它在CI服务器上花费了EVEN LONGER.

What I AM concerned with, is why TRUNCATION takes so long (longer than using DELETE) and why it takes EVEN LONGER on my CI server.

现在,在本地(在Macbook Air上)完整的测试套件需要28分钟.每次我们截断表格时,都要记录日志...

Right now, locally (on a Macbook Air) a full test suite takes 28 minutes. Tailing the logs, each time we truncate tables... ie:

TRUNCATE TABLE table1, table2  -- ... etc

执行截断需要1秒钟以上的时间.在我们的CI服务器(Ubuntu 10.04 LTS)上记录日志,需要花费整整8秒钟来截断表,而构建则需要84分钟.

it takes over 1 second to perform the truncation. Tailing the logs on our CI server (Ubuntu 10.04 LTS), take takes a full 8 seconds to truncate the tables and a build takes 84 minutes.

当我切换到:deletion策略时,我的本地版本花费了20分钟,而CI服务器则减少到了44分钟.这是一个重大区别,我真的对为什么会这样感到惊讶.我已经已调整

When I switched over to the :deletion strategy, my local build took 20 minutes and the CI server went down to 44 minutes. This is a significant difference and I'm really blown away as to why this might be. I've tuned the DB on the CI server, it has 16gb system ram, 4gb shared_buffers... and an SSD. All the good stuff. How is it possible:

a.,它比2GB RAM的Macbook Air慢得多
b.,当
postgresql文档 明确声明,它应该快得多.

a. that it's SO much slower than my Macbook Air with 2gb of ram
b. that TRUNCATION is so much slower than DELETE when the postgresql docs state explicitly that it should be much faster.

有什么想法吗?

推荐答案

最近在SO和PostgreSQL邮件列表上都出现了几次.

This has come up a few times recently, both on SO and on the PostgreSQL mailing lists.

最后两点的 TL; DR :

(a)较大的shared_buffers可能是为什么CI服务器上TRUNCATE较慢的原因.不同的fsync配置或使用旋转介质代替SSD也会出现故障.

(a) The bigger shared_buffers may be why TRUNCATE is slower on the CI server. Different fsync configuration or the use of rotational media instead of SSDs could also be at fault.

(b)TRUNCATE具有固定成本,但不一定比DELETE慢,而且它的工作量更大.请参阅下面的详细说明.

(b) TRUNCATE has a fixed cost, but not necessarily slower than DELETE, plus it does more work. See the detailed explanation that follows.

更新:关于pgsql的重要讨论表现来自这篇文章.参见此线程.

UPDATE: A significant discussion on pgsql-performance arose from this post. See this thread.

更新2:已对9.2beta3添加了改进,应该对此有所帮助,请参见

UPDATE 2: Improvements have been added to 9.2beta3 that should help with this, see this post.

TRUNCATEDELETE FROM 的详细说明:

Detailed explanation of TRUNCATE vs DELETE FROM:

虽然不是该主题的专家,但我的理解是TRUNCATE每张表的成本几乎是固定的,而DELETE对于n行至少为O(n);更糟糕的是,如果有任何外键引用该表,则该表将被删除.

While not an expert on the topic, my understanding is that TRUNCATE has a nearly fixed cost per table, while DELETE is at least O(n) for n rows; worse if there are any foreign keys referencing the table being deleted.

我一直以为TRUNCATE的固定成本低于近空表上的DELETE成本,但这完全不是事实.

I always assumed that the fixed cost of a TRUNCATE was lower than the cost of a DELETE on a near-empty table, but this isn't true at all.

TRUNCATE table;的作用超过DELETE FROM table;

TRUNCATE table; does more than DELETE FROM table;

TRUNCATE table之后的数据库状态与您要运行的情况基本相同:

The state of the database after a TRUNCATE table is much the same as if you'd instead run:

  • DELETE FROM table;
  • VACCUUM (FULL, ANALYZE) table;(仅9.0+,请参见脚注)
  • DELETE FROM table;
  • VACCUUM (FULL, ANALYZE) table; (9.0+ only, see footnote)

...当然,TRUNCATE实际上并没有通过DELETEVACUUM达到其效果.

... though of course TRUNCATE doesn't actually achieve its effects with a DELETE and a VACUUM.

重点是DELETETRUNCATE做不同的事情,所以您不只是比较两个具有相同结果的命令.

The point is that DELETE and TRUNCATE do different things, so you're not just comparing two commands with identical outcomes.

A DELETE FROM table;允许保留无效行和膨胀,允许索引携带无效条目,不更新查询计划程序使用的表统计信息,等等.

A DELETE FROM table; allows dead rows and bloat to remain, allows the indexes to carry dead entries, doesn't update the table statistics used by the query planner, etc.

A TRUNCATE为您提供了一个全新的表和索引,就好像它们只是CREATE一样.就像您删除了所有记录,重新索引了表并执行了VACUUM FULL.

A TRUNCATE gives you a completely new table and indexes as if they were just CREATEed. It's like you deleted all the records, reindexed the table and did a VACUUM FULL.

如果您不打算在表中留下杂物,因为您将要再次填充它,那么使用DELETE FROM table;可能会更好.

If you don't care if there's crud left in the table because you're about to go and fill it up again, you may be better off using DELETE FROM table;.

因为您没有运行VACUUM,您会发现死行和索引条目累积为膨胀,必须对其进行扫描然后忽略;这会使您的所有查询变慢.如果您的测试实际上并未创建和删除所有可能不引起注意或关心的数据,并且您可以始终在测试运行的整个过程中进行VACUUM或两个操作.更好的是,让积极的自动真空设置确保自动真空在后台为您完成.

Because you aren't running VACUUM you will find that dead rows and index entries accumulate as bloat that must be scanned then ignored; this slows all your queries down. If your tests don't actually create and delete all that much data you may not notice or care, and you can always do a VACUUM or two part-way through your test run if you do. Better, let aggressive autovacuum settings ensure that autovacuum does it for you in the background.

在运行整个测试套件之后,您仍然可以TRUNCATE所有表,以确保在多次运行中不会产生任何影响.在9.0及更高版本上,全局上的VACUUM (FULL, ANALYZE);至少和以前一样好,而且要容易得多.

You can still TRUNCATE all your tables after the whole test suite runs to make sure no effects build up across many runs. On 9.0 and newer, VACUUM (FULL, ANALYZE); globally on the table is at least as good if not better, and it's a whole lot easier.

IIRC Pg有一些优化,这意味着它可能会在您的交易是唯一可以看到表格并立即将这些区块标记为自由的交易时注意到.在测试中,当我想要创建膨胀时,我必须有多个并发连接才能做到这一点.不过,我不会依靠它.

IIRC Pg has a few optimisations that mean it might notice when your transaction is the only one that can see the table and immediately mark the blocks as free anyway. In testing, when I've wanted to create bloat I've had to have more than one concurrent connection to do it. I wouldn't rely on this, though.

DELETE FROM table;对于没有f/k引用的小桌子来说非常便宜

DELETE FROM table; is very cheap for small tables with no f/k refs

DELETE表中没有外键引用的所有记录,所有Pg必须执行顺序表扫描并设置遇到的元组的xmax.这是非常便宜的操作-基本上是线性读取和半线性写入. AFAIK不必触摸索引.它们继续指向死元组,直到被后面的VACUUM清除为止,该VACUUM还将表中仅包含死元组的块标记为空闲.

To DELETE all records from a table with no foreign key references to it, all Pg has to do a sequential table scan and set the xmax of the tuples encountered. This is a very cheap operation - basically a linear read and a semi-linear write. AFAIK it doesn't have to touch the indexes; they continue to point to the dead tuples until they're cleaned up by a later VACUUM that also marks blocks in the table containing only dead tuples as free.

DELETE仅在有很多记录,必须检查大量外键引用或计算与VACUUM (FULL, ANALYZE) table;时才变得昂贵>的影响在您的DELETE范围内.

DELETE only gets expensive if there are lots of records, if there are lots of foreign key references that must be checked, or if you count the subsequent VACUUM (FULL, ANALYZE) table; needed to match TRUNCATE's effects within the cost of your DELETE .

在我这里的测试中,DELETE FROM table;通常比TRUNCATE快4倍(0.5毫秒对2毫秒).这是SSD上的测试数据库,运行fsync=off,因为我不在乎是否丢失所有这些数据.当然,DELETE FROM table;并没有完成所有相同的工作,如果我跟进VACUUM (FULL, ANALYZE) table;,则要花费21ms多得多的费用,因此,如果我实际上不需要桌子,则DELETE仅是一次胜利.原始的.

In my tests here, a DELETE FROM table; was typically 4x faster than TRUNCATE at 0.5ms vs 2ms. That's a test DB on an SSD, running with fsync=off because I don't care if I lose all this data. Of course, DELETE FROM table; isn't doing all the same work, and if I follow up with a VACUUM (FULL, ANALYZE) table; it's a much more expensive 21ms, so the DELETE is only a win if I don't actually need the table pristine.

TRUNCATE table;做的固定成本工作和客房清洁要比DELETE

TRUNCATE table; does a lot more fixed-cost work and housekeeping than DELETE

相比之下,TRUNCATE必须做很多工作.它必须为该表,其TOAST表(如果有)以及该表具有的每个索引分配新文件.标头必须写入这些文件中,并且系统目录也可能需要更新(不确定这一点,尚未检查).然后,它必须用新文件替换旧文件或删除旧文件,并必须确保文件系统已通过同步操作(fsync()或类似操作)赶上了更改,通常将所有缓冲区刷新到磁盘上.我不确定如果您正在使用(数据食用)选项fsync=off运行

By contrast, a TRUNCATE has to do a lot of work. It must allocate new files for the table, its TOAST table if any, and every index the table has. Headers must be written into those files and the system catalogs may need updating too (not sure on that point, haven't checked). It then has to replace the old files with the new ones or remove the old ones, and has to ensure the file system has caught up with the changes with a synchronization operation - fsync() or similar - that usually flushes all buffers to the disk. I'm not sure whether the the sync is skipped if you're running with the (data-eating) option fsync=off .

我最近了解到,TRUNCATE还必须刷新与旧表相关的所有PostgreSQL缓冲区.巨大的shared_buffers可能会花费很短的时间.我怀疑这就是为什么它在您的CI服务器上速度较慢.

I learned recently that TRUNCATE must also flush all PostgreSQL's buffers related to the old table. This can take a non-trivial amount of time with huge shared_buffers. I suspect this is why it's slower on your CI server.

余额

无论如何,您可以看到具有关联的TOAST表(大多数这样做)和多个索引的表的TRUNCATE可能需要一些时间.不长,但比近空表中的DELETE长.

Anyway, you can see that a TRUNCATE of a table that has an associated TOAST table (most do) and several indexes could take a few moments. Not long, but longer than a DELETE from a near-empty table.

因此,最好做一个DELETE FROM table;.

-

注意:在9.0之前的DB上,CLUSTER table_id_seq ON table; ANALYZE table;VACUUM FULL ANALYZE table; REINDEX table;会更接近于TRUNCATE. VACUUM FULL展示次数在9.0中变为更好.

Note: on DBs before 9.0, CLUSTER table_id_seq ON table; ANALYZE table; or VACUUM FULL ANALYZE table; REINDEX table; would be a closer equivalent to TRUNCATE. The VACUUM FULL impl changed to a much better one in 9.0.

这篇关于PostgreSQL截断速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆