Postgresql 截断速度 [英] Postgresql Truncation speed

查看:32
本文介绍了Postgresql 截断速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 Postgresql 9.1.4 作为我们的数据库服务器.我一直在努力加快我的测试套件的速度,所以我一直盯着数据库分析一下,以确切了解发生了什么.我们正在使用 database_cleaner 在测试结束时截断表.是的,我知道交易速度更快,但在某些情况下我无法使用它们,所以我不关心这一点.

We're using Postgresql 9.1.4 as our db server. I've been trying to speed up my test suite so I've stared profiling the db a bit to see exactly what's going on. We are using database_cleaner to truncate tables at the end of tests. YES I know transactions are faster, I can't use them in certain circumstances so I'm not concerned with that.

我关心的是为什么 TRUNCATION 需要这么长时间(比使用 DELETE 更长)以及为什么它在我的 CI 服务器上需要更长的时间.

What I AM concerned with, is why TRUNCATION takes so long (longer than using DELETE) and why it takes EVEN LONGER on my CI server.

现在,在本地(在 Macbook Air 上)一个完整的测试套件需要 28 分钟.拖尾日志,每次我们截断表......即:

Right now, locally (on a Macbook Air) a full test suite takes 28 minutes. Tailing the logs, each time we truncate tables... ie:

TRUNCATE TABLE table1, table2  -- ... etc

执行截断需要 1 秒以上.在我们的 CI 服务器 (Ubuntu 10.04 LTS) 上拖尾日志,截断表需要整整 8 秒,构建需要 84 分钟.

it takes over 1 second to perform the truncation. Tailing the logs on our CI server (Ubuntu 10.04 LTS), take takes a full 8 seconds to truncate the tables and a build takes 84 minutes.

当我切换到 :deletion 策略时,我的本地构建需要 20 分钟,而 CI 服务器则下降到 44 分钟.这是一个显着差异,我真的很惊讶为什么会这样.我已经调整 CI 服务器上的 DB,它有 16gb 系统内存、4gb shared_buffers...和一个 SSD.所有的好东西.怎么可能:

When I switched over to the :deletion strategy, my local build took 20 minutes and the CI server went down to 44 minutes. This is a significant difference and I'm really blown away as to why this might be. I've tuned the DB on the CI server, it has 16gb system ram, 4gb shared_buffers... and an SSD. All the good stuff. How is it possible:

a.它比我的带有 2gb 内存的 Macbook Air 慢得多
b.postgresql 文档 明确声明它应该快得多.

a. that it's SO much slower than my Macbook Air with 2gb of ram
b. that TRUNCATION is so much slower than DELETE when the postgresql docs state explicitly that it should be much faster.

有什么想法吗?

推荐答案

这个问题最近出现了几次,在 SO 和 PostgreSQL 邮件列表上都有.

This has come up a few times recently, both on SO and on the PostgreSQL mailing lists.

最后两点的 TL;DR:

(a) 较大的 shared_buffers 可能是 CI 服务器上 TRUNCATE 较慢的原因.不同的 fsync 配置或使用旋转媒体而不是 SSD 也可能有问题.

(a) The bigger shared_buffers may be why TRUNCATE is slower on the CI server. Different fsync configuration or the use of rotational media instead of SSDs could also be at fault.

(b) TRUNCATE 有一个固定的成本,但不一定比 DELETE 慢,而且它做了更多的工作.请参阅后面的详细说明.

(b) TRUNCATE has a fixed cost, but not necessarily slower than DELETE, plus it does more work. See the detailed explanation that follows.

更新:关于 pgsql 的重要讨论-performance 来自这篇文章.请参阅此主题.

UPDATE: A significant discussion on pgsql-performance arose from this post. See this thread.

更新 2: 9.2beta3 中添加了一些改进,应该会对此有所帮助,请参阅 这篇文章.

UPDATE 2: Improvements have been added to 9.2beta3 that should help with this, see this post.

TRUNCATE vs DELETE FROM详解:

Detailed explanation of TRUNCATE vs DELETE FROM:

虽然不是该主题的专家,但我的理解是 TRUNCATE 每个表的成本几乎是固定的,而 DELETE 对于 n 行至少是 O(n);更糟糕的是,如果有任何外键引用被删除的表.

While not an expert on the topic, my understanding is that TRUNCATE has a nearly fixed cost per table, while DELETE is at least O(n) for n rows; worse if there are any foreign keys referencing the table being deleted.

我一直认为 TRUNCATE 的固定成本低于一个近空表上的 DELETE 成本,但这根本不是真的.

I always assumed that the fixed cost of a TRUNCATE was lower than the cost of a DELETE on a near-empty table, but this isn't true at all.

TRUNCATE table;DELETE FROM table;

TRUNCATE table; does more than DELETE FROM table;

TRUNCATE table 之后的数据库状态与您运行的情况大致相同:

The state of the database after a TRUNCATE table is much the same as if you'd instead run:

  • 从表中删除;
  • VACCUUM (FULL, ANALYZE) 表;(仅限 9.0+,见脚注)
  • DELETE FROM table;
  • VACCUUM (FULL, ANALYZE) table; (9.0+ only, see footnote)

...当然TRUNCATE实际上并没有通过DELETEVACUUM实现它的效果.

... though of course TRUNCATE doesn't actually achieve its effects with a DELETE and a VACUUM.

关键是 DELETETRUNCATE 做不同的事情,所以你不仅仅是比较两个具有相同结果的命令.

The point is that DELETE and TRUNCATE do different things, so you're not just comparing two commands with identical outcomes.

A DELETE FROM table; 允许保留死行和膨胀,允许索引携带死条目,不更新查询计划器使用的表统计信息等.

A DELETE FROM table; allows dead rows and bloat to remain, allows the indexes to carry dead entries, doesn't update the table statistics used by the query planner, etc.

A TRUNCATE 为您提供一个全新的表和索引,就好像它们只是 CREATE ed.这就像您删除了所有记录,重新索引了表并执行了 VACUUM FULL.

A TRUNCATE gives you a completely new table and indexes as if they were just CREATEed. It's like you deleted all the records, reindexed the table and did a VACUUM FULL.

如果您不关心表格中是否有残渣,因为您将要重新填满它,那么最好使用 DELETE FROM table;.

If you don't care if there's crud left in the table because you're about to go and fill it up again, you may be better off using DELETE FROM table;.

因为你没有运行VACUUM,你会发现死行和索引条目累积为必须被扫描然后忽略的膨胀;这会减慢您的所有查询速度.如果您的测试实际上并没有创建和删除所有您可能不会注意到或不在意的数据,并且如果您这样做,您总是可以在测试运行中执行一个 VACUUM 或两个部分.更好的是,让激进的 autovacuum 设置确保 autovacuum 在后台为您执行此操作.

Because you aren't running VACUUM you will find that dead rows and index entries accumulate as bloat that must be scanned then ignored; this slows all your queries down. If your tests don't actually create and delete all that much data you may not notice or care, and you can always do a VACUUM or two part-way through your test run if you do. Better, let aggressive autovacuum settings ensure that autovacuum does it for you in the background.

整个测试套件运行后,您仍然可以TRUNCATE所有表,以确保在多次运行中不会产生影响.在 9.0 和更新版本上,VACUUM (FULL, ANALYZE); 全局在表上至少同样好,甚至更容易.

You can still TRUNCATE all your tables after the whole test suite runs to make sure no effects build up across many runs. On 9.0 and newer, VACUUM (FULL, ANALYZE); globally on the table is at least as good if not better, and it's a whole lot easier.

IIRC Pg 有一些优化,这意味着当您的事务是唯一可以看到表并立即将块标记为空闲的事务时,它可能会注意到.在测试中,当我想要创建膨胀时,我必须有多个并发连接才能做到这一点.不过,我不会依赖于此.

IIRC Pg has a few optimisations that mean it might notice when your transaction is the only one that can see the table and immediately mark the blocks as free anyway. In testing, when I've wanted to create bloat I've had to have more than one concurrent connection to do it. I wouldn't rely on this, though.

DELETE FROM table; 对于没有 f/k refs 的小表来说非常便宜

DELETE FROM table; is very cheap for small tables with no f/k refs

DELETE表中没有外键引用的所有记录,所有Pg都必须进行顺序表扫描并设置遇到的元组的xmax.这是一个非常便宜的操作——基本上是线性读取和半线性写入.AFAIK 它不必接触索引;它们继续指向死元组,直到它们被稍后的 VACUUM 清除,该 VACUUM 还将表中仅包含死元组的块标记为空闲.

To DELETE all records from a table with no foreign key references to it, all Pg has to do a sequential table scan and set the xmax of the tuples encountered. This is a very cheap operation - basically a linear read and a semi-linear write. AFAIK it doesn't have to touch the indexes; they continue to point to the dead tuples until they're cleaned up by a later VACUUM that also marks blocks in the table containing only dead tuples as free.

DELETE 仅在有 很多 条记录、必须检查大量外键引用或计算后续 时才会变得昂贵VACUUM (FULL, ANALYZE) table; 需要在 DELETE 的成本内匹配 TRUNCATE 的效果.

DELETE only gets expensive if there are lots of records, if there are lots of foreign key references that must be checked, or if you count the subsequent VACUUM (FULL, ANALYZE) table; needed to match TRUNCATE's effects within the cost of your DELETE .

在我的测试中,DELETE FROM table; 通常比 TRUNCATE 快 4 倍,分别为 0.5 毫秒和 2 毫秒.这是 SSD 上的测试数据库,使用 fsync=off 运行,因为我不在乎是否会丢失所有这些数据.当然,DELETE FROM table; 并没有做所有相同的工作,如果我跟进一个 VACUUM (FULL, ANALYZE) table;,它的成本要高得多21 毫秒,所以 DELETE 只是在我实际上不需要原始表格的情况下获胜.

In my tests here, a DELETE FROM table; was typically 4x faster than TRUNCATE at 0.5ms vs 2ms. That's a test DB on an SSD, running with fsync=off because I don't care if I lose all this data. Of course, DELETE FROM table; isn't doing all the same work, and if I follow up with a VACUUM (FULL, ANALYZE) table; it's a much more expensive 21ms, so the DELETE is only a win if I don't actually need the table pristine.

TRUNCATE table;DELETE

TRUNCATE table; does a lot more fixed-cost work and housekeeping than DELETE

相比之下,TRUNCATE 需要做很多工作.它必须为表、TOAST 表(如果有)以及表具有的每个索引分配新文件.必须将标头写入这些文件中,并且系统目录也可能需要更新(当时不确定,尚未检查).然后它必须用新文件替换旧文件或删除旧文件,并且必须确保文件系统已通过同步操作(fsync() 或类似操作)赶上更改,这通常会将所有缓冲区刷新到磁盘.如果您使用(数据吃)选项 fsync=off 运行,我不确定同步是否会被跳过.

By contrast, a TRUNCATE has to do a lot of work. It must allocate new files for the table, its TOAST table if any, and every index the table has. Headers must be written into those files and the system catalogs may need updating too (not sure on that point, haven't checked). It then has to replace the old files with the new ones or remove the old ones, and has to ensure the file system has caught up with the changes with a synchronization operation - fsync() or similar - that usually flushes all buffers to the disk. I'm not sure whether the the sync is skipped if you're running with the (data-eating) option fsync=off .

我最近了解到 TRUNCATE 还必须刷新所有与旧表相关的 PostgreSQL 缓冲区.对于巨大的shared_buffers,这可能会花费大量时间.我怀疑这就是为什么它在您的 CI 服务器上变慢的原因.

I learned recently that TRUNCATE must also flush all PostgreSQL's buffers related to the old table. This can take a non-trivial amount of time with huge shared_buffers. I suspect this is why it's slower on your CI server.

平衡

无论如何,您可以看到对具有关联 TOAST 表(大多数情况下)和多个索引的表执行 TRUNCATE 可能需要一些时间.不长,但比一个几乎为空的表中的 DELETE 更长.

Anyway, you can see that a TRUNCATE of a table that has an associated TOAST table (most do) and several indexes could take a few moments. Not long, but longer than a DELETE from a near-empty table.

因此,您最好执行 DELETE FROM table;.

--

注意:在9.0之前的DB上,CLUSTER table_id_seq ON table;ANALYZE table;VACUUM FULL ANALYZE table;REINDEX table; 将更接近于 TRUNCATE.VACUUM FULL impl 在 9.0 中更改为更好的一个.

Note: on DBs before 9.0, CLUSTER table_id_seq ON table; ANALYZE table; or VACUUM FULL ANALYZE table; REINDEX table; would be a closer equivalent to TRUNCATE. The VACUUM FULL impl changed to a much better one in 9.0.

这篇关于Postgresql 截断速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆