使用libpqxx批量存储数据,或如何在libpqxx中使用COPY语句 [英] Using libpqxx for to store data in bulk OR how to use COPY statement in libpqxx

查看:457
本文介绍了使用libpqxx批量存储数据,或如何在libpqxx中使用COPY语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要在PostgreSQL中插入批量数据/填充数据库,最快的方法是使用COPY。 来源

我必须填充数据库。现在,我的写入速度低至每秒100-200 。这涉及通过C ++库 libpqxx 发送许多单独的INSERTS。我想的两个原因是:

To insert bulk data/populate a database in PostgreSQL, the fastest way would be to use COPY. Source
I have to populate a database. Right now I am getting write speeds as low as 100-200 per second. This involves sending many individual INSERTS through the C++ library libpqxx. The two reasons I suppose are:


  1. 数据有很多重复的记录。(我有原始日志,其中我解析并发送。)这会导致主键异常。

  2. 插入语句的一对一发送。

  1. The data has many repeated records.(I have raw logs, which I parse and send.) Which causes primary key exception.
  2. The one-by-one sending of the Insert Statements.

第一个不在我手中。但是我正在阅读第二个。


据我所知,tablewriter类适合于此目的。但是,它显然已被弃用。我已经读过,可以使用stdin作为参数进行复制。

但是,在掌握了这些线索之后,我迷路了。有人可以引导我找到解决方案吗?

The first one is out of my hands. However I was reading about the second one.
As far as I know tablewriter class was suited to this purpose. However it has apparently been deprecated. I have read that it possible to use stdin as a parameter to copy.
But after these clues I am lost. Can someone lead me to a solution?

编辑:
这是代码,在这里我有一个执行statemnt的函数:

Here is the code, where I have a function which executes the statemnt:

void pushLog(Log log,pqxx::connection *conn){
    pqxx::work w(*conn);
    std::stringstream stmt;
    stmt<<"INSERT INTO logs VALUES('"<<log.getDevice()<<"','"<<log.getUser()<<"','"<<log.getDate()<<"','"<<log.getLabel()<<"');";
    try{
        pqxx::result res = w.exec(stmt.str());
        w.commit();
    }
    catch(const std::exception &e){
        std::cerr << e.what() << std::endl;
        std::cout<<"Exception on statement:["<<stmt.str()<<"]\n";
        return;
    }

}

我之前建立了连接,并且

I establish the connection earlier, and pass a reference to it.

PS:这个问题可能缺少一些细节。如果是这样,请发表评论,我将对其进行编辑和添加。

推荐答案

pushLog 函数分别提交每个插入,并且提交很慢。

The pushLog function commits every insert separately, and commit is slow.

如文档的填充数据库


如果您允许每个插入分别提交,则PostgreSQL是
为添加的每一行做很多工作

If you allow each insertion to be committed separately, PostgreSQL is doing a lot of work for each row that is added

另外:


在一个事务中进行所有插入的另一个好处是,如果插入一行失败,则
那么插入
的操作将回滚到该点插入的所有行,因此您不会
被部分加载的数据卡住

An additional benefit of doing all insertions in one transaction is that if the insertion of one row were to fail then the insertion of all rows inserted up to that point would be rolled back, so you won't be stuck with partially loaded data

但是,在您的情况下,这将是一个问题而不是好处,因为每个INSERT可能都会失败, n主键冲突,因此取消了自上次提交以来的先前INSERT。
请注意,如果您要使用 COPY ,这也是一个问题。

In your case however, that would be a problem rather than a benefit, because each INSERT may fail on primary key violation, thus cancelling the previous INSERTs since the last commit. Note that this would also be a problem with COPY, should you use that.

因为将交易中的查询分组以提高性能是非常必要的,您需要以不会中止交易的方式处理主键冲突。

Since it's really necessary to group queries in transactions for performance, you need to deal with primary key violations in a way that doesn't abort the transaction.

通常使用两种方法:


  1. 避免错误: INSERT INTO ...(不存在)(从表WHERE primary_key中选择1) = ...)

通过在具有EXCEPTION块而忽略itr的plpgsql函数中插入错误来捕获错误。导致重复的特定INSERT将被取消,但事务不会中止。

Trap the error by inserting inside a plpgsql function that has an EXCEPTION block ignoring itr. The specific INSERT(s) causing a duplicate will be cancelled but the transaction will not be aborted.

如果并发插入时,需要使用锁定策略来完善这些方法。

If you have concurrent inserts these methods need to be refined with a locking strategy.

这篇关于使用libpqxx批量存储数据,或如何在libpqxx中使用COPY语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆