优化PostgreSQL进行快速测试 [英] Optimise PostgreSQL for fast testing

查看:257
本文介绍了优化PostgreSQL进行快速测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个典型的Rails应用程序切换到PostgreSQL从SQLite。



问题是运行规范变慢与PG。

SQLite它花了〜34秒,在PG上〜76秒,这是超过2倍的慢



现在我想应用一些技术到使规范的性能与SQLite 相同,没有修改代码(最好只是通过设置连接选项,这是不可能的)。



从我的头顶的几个明显的东西是:




  • RAM磁盘(好的设置与RSpec在OSX上将很好看)

  • 未记录的表(可以应用于整个数据库,因此我没有更改所有脚本?)



正如你可能已经理解的,我不在乎可靠性和其他(DB只是一个一次性的东西在这里)。

我需要充分利用

最好的答案会理想地描述



UPDATE: fsync = off + full_page_writes = off 只会将时间减少到〜65秒(约-16秒)。良好的开端,但远远超过34的目标。



更新2:尝试使用RAM磁盘,但性能增益在误差范围内。



更新3:*
我发现最大的瓶颈,现在我的规格运行就像SQLite的一样快。



问题是数据库清理做了截断。显然SQLite在那里太快了。



要修复它,我在每次测试之前打开一个事务,并在最后回滚。



约700个测试的一些数字。




  • 截断:SQLite -

  • 增加SQLite。
    PG的4倍速度。

    解决方案

    首先,总是使用最新版本的PostgreSQL。性能改进总是会到来的,所以如果你调整旧版本,你可能浪费你的时间。例如, PostgreSQL 9.2显着提高了 TRUNCATE ,当然也添加了索引扫描。即使是小的释放应该总是遵循;请参阅版本政策



    不要

    h2>

    执行 NOT 将表空间放在RAMdisk或其他非持久性存储上



    如果丢失表空间,整个数据库可能会损坏并且很难使用重要工作。与只使用 UNLOGGED 表并具有大量高速缓存的RAM相比,这方面的优势很少。



    如果你真的想要一个基于ramdisk的系统, initdb 在一个新的PostgreSQL实例上, initdb



    PostgreSQL服务器配置



    测试时,您可以请为您的服务器配置非持久但更快的操作



    这是 fsync = off 在PostgreSQL中设置。这个设置几乎告诉PostgreSQL不要打扰有序写入或任何其他讨厌的数据完整性保护和崩溃安全的东西,如果你失去电源或有操作系统崩溃,它允许完全删除您的数据。



    不用说,你不应该在生产中启用 fsync = off ,除非你使用Pg作为数据的临时数据库您可以从其他地方重新生成。当且仅当您打算关闭fsync时,还可以转到 full_page_writes ,因为它不再有任何好处。请务必在群集级别应用 fsync = off full_page_writes ,因此会影响<



    对于生产环境,您可以使用 synchronous_commit = off 并设置 commit_delay ,因为您将获得许多与 fsync = off 相同的优点腐败风险。如果你启用了异步提交,你确实有一个小窗口的最近的数据丢失 - 但是这是。



    如果你有选择稍微改变DDL,你也可以使用Pg 9.1+中的 UNLOGGED 表来完全避免WAL日志记录,并以服务器崩溃时表被擦除为代价获得真正的速度提升。没有配置选项使所有表不记录,必须在 CREATE TABLE 期间设置。除了有利于测试之外,如果您的数据库中有充满生成或不重要数据的表格,否则包含您需要安全的东西。



    检查您的日志,看看是否收到太多检查点的警告。如果是,您应该增加 checkpoint_segments 。您还可以调整您的checkpoint_completion_target以顺利写出。



    调整 shared_buffers 以适应您的工作负载。这取决于操作系统,取决于你的机器上发生了什么,需要一些试验和错误。默认值非常保守。如果在PostgreSQL 9.2及以下版本中增加 shared_buffers ,则可能需要增加操作系统的最大共享内存限制; 9.3及以上版本改变了他们如何使用共享内存来避免这种情况。



    如果你使用的只是几个连接,做很多工作,增加 work_mem 以给它们更多的RAM来进行排序等。小心太高的 work_mem 设置可能导致内存不足问题因为它的每个排序不是每个连接,因此一个查询可以有多个嵌套排序。如果您可以在 EXPLAIN 中看到排序溢出到磁盘,则只有 必须增加 work_mem 或使用 log_temp_files 设置(推荐),但更高的值也可能让Pg选择更聪明的计划。



    将xlog和主表/索引放在单独的HDD上(如果可能)。单独的分区是毫无意义的,你真的想要单独的驱动器。如果你使用 fsync = off 运行,并且几乎没有,如果你使用 UNLOGGED



    最后,调整查询。确保您的 random_page_cost seq_page_cost 反映您的系统性能,确保您的 effective_cache_size 是正确的等等。使用 EXPLAIN(BUFFERS,ANALYZE)来检查个别查询计划,并将 auto_explain 模块打开以报告所有慢查询。您通常可以通过创建适当的索引或调整成本参数来大幅提高查询性能。



    AFAIK没有办法将整个数据库或群集设置为 UNLOGGED 。这是有趣的是能够这样做。考虑询问PostgreSQL邮件列表。



    主机操作系统调整



    您可以在操作系统级别。你可能想要做的主要事情是说服操作系统不要积极地刷新写入磁盘,因为你真的不在乎/如果他们使它到磁盘。



    在Linux中,您可以使用虚拟内存子系统 dirty _ * 设置,如 dirty_writeback_centisecs



    过于松弛是一些其他程序的刷新可能会导致所有PostgreSQL的累积缓冲区也被刷新,导致大的停顿,而一切阻塞在写入。您可以通过在不同的文件系统上运行PostgreSQL来缓解这种情况,但是一些刷新可能是设备级或整个主机级,而不是文件系统级,所以您不能依赖。



    这个调整真的需要用设置来看看什么对你的工作量最好。



    在新的内核,你可能希望请确保 vm.zone_reclaim_mode 设置为零,因为它可能会导致严重的性能问题NUMA系统(大多数系统,这些天),由于与如何PostgreSQL管理 shared_buffers



    查询和工作负载调整



    需要代码更改;他们可能不适合你。有些是您可以应用的。



    如果您不是将工作分批为更大的事务,请开始。许多小交易是昂贵的,所以你应该批量的东西,只要有可能和实用的做到这一点。如果你使用异步提交,这并不重要,但仍然强烈推荐。



    尽可能使用临时表。它们不会生成WAL流量,因此插入和更新的速度更快。有时候,值得把一堆数据写入一个临时表,然后操作它,然后你需要,然后做一个 INSERT INTO ... SELECT ... 将其复制到最终表。注意临时表是每个会话;如果您的会话结束或您失去了连接,那么临时表将消失,并且没有其他连接可以看到会话临时表的内容。



    如果您使用PostgreSQL 9.1或更高版本,您可以使用 UNLOGGED 表格,您可以承受损失的数据,如会话状态。这些在不同会话之间可见,并在连接之间保留。如果服务器不干净地关闭,它们会被截断,因此它们不能用于您无法重新创建的任何内容,但它们对于缓存,物化视图,状态表等非常有用。



    一般来说,不要 DELETE from blah; 。使用 TRUNCATE TABLE blah; 当你转储表中的所有行时,它会快得多。如果可以,在一个 TRUNCATE 调用中截断许多表。有一个警告,如果你做了大量的 TRUNCATES 的小表一遍又一遍;请参阅: Postgresql截断速度



    如果你没有对外键的索引,涉及这些外键引用的主键的 DELETE 将会非常慢。如果您希望从引用的表中获得 DELETE ,请确保创建此类索引。 TRUNCATE 不需要索引。



    不要创建不需要的索引。每个索引都有维护成本。尝试使用最小的索引集,并让位图索引扫描结合它们,而不是维护太多庞大,昂贵的多列索引。



    硬件



    如果你可以管理整个数据库,有足够的RAM可以获得巨大的收益。



    如果你没有足够的内存,你可以得到更快的存储。即使是便宜的SSD,在旋转生锈上也有巨大的区别。



    学习



    Greg Smith的书, PostgreSQL 9.0高性能仍然相关,尽管提到一个较旧的版本。



    加入PostgreSQL一般邮件列表并按照它。



    阅读: / h2>


    I am switching to PostgreSQL from SQLite for a typical Rails application.

    The problem is that running specs became slow with PG.
    On SQLite it took ~34 seconds, on PG it's ~76 seconds which is more than 2x slower.

    So now I want to apply some techniques to bring the performance of the specs on par with SQLite with no code modifications (ideally just by setting the connection options, which is probably not possible).

    Couple of obvious things from top of my head are:

    • RAM Disk (good setup with RSpec on OSX would be good to see)
    • Unlogged tables (can it be applied on the whole database so I don't have change all the scripts?)

    As you may have understood I don't care about reliability and the rest (the DB is just a throwaway thingy here).
    I need to get the most out of the PG and make it as fast as it can possibly be.

    Best answer would ideally describe the tricks for doing just that, setup and the drawbacks of those tricks.

    UPDATE: fsync = off + full_page_writes = off only decreased time to ~65 seconds (~-16 secs). Good start, but far from the target of 34.

    UPDATE 2: I tried to use RAM disk but the performance gain was within an error margin. So doesn't seem to be worth it.

    UPDATE 3:* I found the biggest bottleneck and now my specs run as fast as the SQLite ones.

    The issue was the database cleanup that did the truncation. Apparently SQLite is way too fast there.

    To "fix" it I open a transaction before each test and roll it back at the end.

    Some numbers for ~700 tests.

    • Truncation: SQLite - 34s, PG - 76s.
    • Transaction: SQLite - 17s, PG - 18s.

    2x speed increase for SQLite. 4x speed increase for PG.

    解决方案

    First, always use the latest version of PostgreSQL. Performance improvements are always coming, so you're probably wasting your time if you're tuning an old version. For example, PostgreSQL 9.2 significantly improves the speed of TRUNCATE and of course adds index-only scans. Even minor releases should always be followed; see the version policy.

    Don'ts

    Do NOT put a tablespace on a RAMdisk or other non-durable storage.

    If you lose a tablespace the whole database may be damaged and hard to use without significant work. There's very little advantage to this compared to just using UNLOGGED tables and having lots of RAM for cache anyway.

    If you truly want a ramdisk based system, initdb a whole new cluster on the ramdisk by initdbing a new PostgreSQL instance on the ramdisk, so you have a completely disposable PostgreSQL instance.

    PostgreSQL server configuration

    When testing, you can configure your server for non-durable but faster operation.

    This is one of the only acceptable uses for the fsync=off setting in PostgreSQL. This setting pretty much tells PostgreSQL not to bother with ordered writes or any of that other nasty data-integrity-protection and crash-safety stuff, giving it permission to totally trash your data if you lose power or have an OS crash.

    Needless to say, you should never enable fsync=off in production unless you're using Pg as a temporary database for data you can re-generate from elsewhere. If and only if you're doing to turn fsync off can also turn full_page_writes off, as it no longer does any good then. Beware that fsync=off and full_page_writes apply at the cluster level, so they affect all databases in your PostgreSQL instance.

    For production use you can possibly use synchronous_commit=off and set a commit_delay, as you'll get many of the same benefits as fsync=off without the giant data corruption risk. You do have a small window of loss of recent data if you enable async commit - but that's it.

    If you have the option of slightly altering the DDL, you can also use UNLOGGED tables in Pg 9.1+ to completely avoid WAL logging and gain a real speed boost at the cost of the tables getting erased if the server crashes. There is no configuration option to make all tables unlogged, it must be set during CREATE TABLE. In addition to being good for testing this is handy if you have tables full of generated or unimportant data in a database that otherwise contains stuff you need to be safe.

    Check your logs and see if you're getting warnings about too many checkpoints. If you are, you should increase your checkpoint_segments. You may also want to tune your checkpoint_completion_target to smooth writes out.

    Tune shared_buffers to fit your workload. This is OS-dependent, depends on what else is going on with your machine, and requires some trial and error. The defaults are extremely conservative. You may need to increase the OS's maximum shared memory limit if you increase shared_buffers on PostgreSQL 9.2 and below; 9.3 and above changed how they use shared memory to avoid that.

    If you're using a just a couple of connections that do lots of work, increase work_mem to give them more RAM to play with for sorts etc. Beware that too high a work_mem setting can cause out-of-memory problems because it's per-sort not per-connection so one query can have many nested sorts. You only really have to increase work_mem if you can see sorts spilling to disk in EXPLAIN or logged with the log_temp_files setting (recommended), but a higher value may also let Pg pick smarter plans.

    As said by another poster here it's wise to put the xlog and the main tables/indexes on separate HDDs if possible. Separate partitions is pretty pointless, you really want separate drives. This separation has much less benefit if you're running with fsync=off and almost none if you're using UNLOGGED tables.

    Finally, tune your queries. Make sure that your random_page_cost and seq_page_cost reflect your system's performance, ensure your effective_cache_size is correct, etc. Use EXPLAIN (BUFFERS, ANALYZE) to examine individual query plans, and turn the auto_explain module on to report all slow queries. You can often improve query performance dramatically just by creating an appropriate index or tweaking the cost parameters.

    AFAIK there's no way to set an entire database or cluster as UNLOGGED. It'd be interesting to be able to do so. Consider asking on the PostgreSQL mailing list.

    Host OS tuning

    There's some tuning you can do at the operating system level, too. The main thing you might want to do is convince the operating system not to flush writes to disk aggressively, since you really don't care when/if they make it to disk.

    In Linux you can control this with the virtual memory subsystem's dirty_* settings, like dirty_writeback_centisecs.

    The only issue with tuning writeback settings to be too slack is that a flush by some other program may cause all PostgreSQL's accumulated buffers to be flushed too, causing big stalls while everything blocks on writes. You may be able to alleviate this by running PostgreSQL on a different file system, but some flushes may be device-level or whole-host-level not filesystem-level, so you can't rely on that.

    This tuning really requires playing around with the settings to see what works best for your workload.

    On newer kernels, you may wish to ensure that vm.zone_reclaim_mode is set to zero, as it can cause severe performance issues with NUMA systems (most systems these days) due to interactions with how PostgreSQL manages shared_buffers.

    Query and workload tuning

    These are things that DO require code changes; they may not suit you. Some are things you might be able to apply.

    If you're not batching work into larger transactions, start. Lots of small transactions are expensive, so you should batch stuff whenever it's possible and practical to do so. If you're using async commit this is less important, but still highly recommended.

    Whenever possible use temporary tables. They don't generate WAL traffic, so they're lots faster for inserts and updates. Sometimes it's worth slurping a bunch of data into a temp table, manipulating it however you need to, then doing an INSERT INTO ... SELECT ... to copy it to the final table. Note that temporary tables are per-session; if your session ends or you lose your connection then the temp table goes away, and no other connection can see the contents of a session's temp table(s).

    If you're using PostgreSQL 9.1 or newer you can use UNLOGGED tables for data you can afford to lose, like session state. These are visible across different sessions and preserved between connections. They get truncated if the server shuts down uncleanly so they can't be used for anything you can't re-create, but they're great for caches, materialized views, state tables, etc.

    In general, don't DELETE FROM blah;. Use TRUNCATE TABLE blah; instead; it's a lot quicker when you're dumping all rows in a table. Truncate many tables in one TRUNCATE call if you can. There's a caveat if you're doing lots of TRUNCATES of small tables over and over again, though; see: Postgresql Truncation speed

    If you don't have indexes on foreign keys, DELETEs involving the primary keys referenced by those foreign keys will be horribly slow. Make sure to create such indexes if you ever expect to DELETE from the referenced table(s). Indexes are not required for TRUNCATE.

    Don't create indexes you don't need. Each index has a maintenance cost. Try to use a minimal set of indexes and let bitmap index scans combine them rather than maintaining too many huge, expensive multi-column indexes. Where indexes are required, try to populate the table first, then create indexes at the end.

    Hardware

    Having enough RAM to hold the entire database is a huge win if you can manage it.

    If you don't have enough RAM, the faster storage you can get the better. Even a cheap SSD makes a massive difference over spinning rust. Don't trust cheap SSDs for production though, they're often not crashsafe and might eat your data.

    Learning

    Greg Smith's book, PostgreSQL 9.0 High Performance remains relevant despite referring to a somewhat older version. It should be a useful reference.

    Join the PostgreSQL general mailing list and follow it.

    Reading:

    这篇关于优化PostgreSQL进行快速测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆