在同一事务并发数据库(PostgreSQL的)命令 [英] Concurrent database (PostgreSQL) commands in the same transaction

查看:182
本文介绍了在同一事务并发数据库(PostgreSQL的)命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在写一个.NET 4的应用程序,进口了大量的数据从一个文件到一个PostgreSQL 9.1数据库。分析显示,该数据库调用实际插入的数据占用了90%的时间。 DB服务器似乎是CPU绑定 - 使用一个CPU的所有

I'm writing a .NET 4 application that imports a large amount of data from a file into a PostgreSQL 9.1 database. Profiling shows that the DB calls to actually INSERT the data take up over 90% of the time. The DB server appears to be CPU-bound - using all of one CPU.

如果可能的话,我愿意用所有的CPU来导入数据的速度。输入文件可以被分为客户端上的作品,所以这通常不会太辛苦,但我要确保,如果发生在导入文件中的任何错误,那么数据库不被修改的。要做到这一点,我在做全进口在一个事务中。

If possible, I'd like to import the data faster by using all the CPUs. The input file could be broken up into pieces on the client, so this normally wouldn't be too hard, but I want to ensure that if any errors occur in importing a file then the DB is not modified at all. To accomplish this I'm doing the entire import in one transaction.

是否有可能以某种方式同时发送命令到数据库服务器(利用其所有的CPU),但仍保证,无论是整个导入成功,或者不进行任何更改?据我了解的事务不能由多个线程来同时运行多个命令,可以吗?我使用Npgsql的ADO.NET提供者,如果有差别。

Is it possible to somehow send concurrent commands to the DB server (to utilise all of its CPUs), but still ensure that either the entire import succeeds or no changes are made? As far as I understand a transaction cannot be used from multiple threads to run multiple commands concurrently, can it? I'm using Npgsql as the ADO.NET provider, if that makes a difference.

推荐答案

我是pretty的确认交易,不能并行多线程处理,至少不符合标准的PostgreSQL。

I am pretty sure that a transaction cannot be processed in parallel by multiple threads, at least not with standard PostgreSQL.

这看起来可疑,虽然你的INSERT操作是CPU密集型的。有几件事情你都不可能在这里得到改善。究竟如何将数据发送到服务器?目前基本上有四种途径插入数据插入到表:

It seems suspicious though that your INSERT operation is CPU-bound. There is a couple of things you can possibly improve here. How exactly do you send the data to the server? There is basically four ways to INSERT data into a table:

  1. 在一行,时间值()
  2. 在一个时间值的多个行()
  3. INSERT 通过选择
  4. COPY
  1. one row at a time by VALUE()
  2. multiple rows at a time by VALUE()
  3. INSERT by SELECT
  4. COPY

复制是最快的方法,通过为止。

COPY is the fastest method by far.

  • 这也大大加快为删除索引一个巨大的BULK INSERT / COPY前并重新创建它们之后,因为它是更昂贵的逐步调整指标为每一个插入的行,而不是创建它们在一次。

  • It is also substantially faster to delete indexes before a huge bulk INSERT / COPY and recreate them afterwards as it is much more costly to incrementally adjust indexes for every inserted row than to create them at once.

触发器,约束或外键约束其他的因素,可以减缓你失望。也许你可以禁用/批量加载之前删除,启用/重建之后?

Triggers, constraints or foreign key constraints are other factors that can slow you down. Maybe you could disable / delete before the bulk load and enable / recreate afterwards?

也有许多设置,可以使一个显着的区别。

There are also a number of settings that can make a substantial difference.

禁用自动清理暂时。运行分析随即。 (小心那些!)

Disable autovacuum temporarily. Run ANALYZE immediately afterwards. (Careful with those!)

阅读有关批量加载和恢复和的调整你的PostgreSQL服务器在PostgreSQL的维基,尤其是上段的 CHECKPOINT_SEGMENTS checkpoint_completion_target

Read the article about Bulk Loading and Restores and Tuning Your PostgreSQL Server in the PostgreSQL wiki, especially the paragraphs on checkpoint_segments and checkpoint_completion_target.

该操作可能不会像CPU绑定的,因为它似乎。看一看在PostgreSQL的维基这段落。

The operation may not be as CPU-bound as it seems. Have a look at this paragraph in the PostgreSQL Wiki.

放缓的一个多个源可能会记录。例如, log_statement =所有 可能意味着与单行插入一个戏剧性的放缓。

One more source of slowdown might be logging. For instance, log_statement = all could mean a dramatic slow down with single-row inserts.

下面是一个快速的方法来检查所有的自定义设置在PostgreSQL的维基一次。

Here is a quick method to check all your custom settings in the PostgreSQL Wiki once more.

可能加快的东西,尤其是当你无法关闭FSYNC。创建一个空的临时表这样的(或其中几个):

Might speed up things, especially as you cannot turn off fsync. Create an empty temporary table like this (or several of them):

CREATE TEMP TABLE x_tmp AS SELECT * FROM real_tbl LIMIT 0;

您需要把一些思想到如何处理与其他违约序列! INSERT到它(它们)。一旦您完成后,在将数据写入到真实表中一气呵成。索引和约束再次关闭,但对于更短的时间。

You'll need to put some thought into how to deal with sequences of other defaults! INSERT into it (them). Once you're done, write the data over into the real tables in one go. Indexes and constraints off again, but for a much shorter time.

INSERT INTO real_tbl SELECT * FROM x_tmp ORDER BY something;
DROP TABLE x_tmp;

可能大大加快。一定要使用足够的RAM的各种设置。看<一href="http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY"相对=nofollow> temp_buffers 尤其如此。

Could be substantially faster. Be sure to use enough RAM for various settings. Look at temp_buffers in particular.

这篇关于在同一事务并发数据库(PostgreSQL的)命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆