有大量查询的node-postgres [英] node-postgres with massive amount of queries

查看:88
本文介绍了有大量查询的node-postgres的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用node-postgres在postgres中使用node.js。我试图做的一件事是写一个简短的js来填充我的数据库,使用的文件大约有200,000。

I just started playing around with node.js with postgres, using node-postgres. One of the things I tried to do is to write a short js to populate my database, using a file with about 200,000 entries.

我注意到一段时间后(少于10秒),我开始收到错误:连接已终止。我不确定这是我使用node-postgres的方式是否有问题,还是因为我在向postgres发送垃圾邮件。

I noticed that after sometime (less than 10 seconds), I start to get "Error: Connection terminated". I am not sure whether this is problem with how I use node-postgres, or if it's because I was spamming postgres.

无论如何,下面是显示此行为的简单代码:

Anyway, here is a simple code that shows this behaviour:

var pg = require('pg');
var connectionString = "postgres://xxxx:xxxx@localhost/xxxx";

pg.connect(connectionString, function(err,client,done){
  if(err) {
    return console.error('could not connect to postgres', err);
  }

  client.query("DROP TABLE IF EXISTS testDB");
  client.query("CREATE TABLE IF NOT EXISTS testDB (id int, first int, second int)");
  done();

  for (i = 0; i < 1000000; i++){
    client.query("INSERT INTO testDB VALUES (" + i.toString() + "," + (1000000-i).toString() + "," + (-i).toString() + ")",   function(err,result){
      if (err) {
         return console.error('Error inserting query', err);
      }
      done();
    });
  }
});

在查询大约18,000-20,000后失败。这是使用client.query的错误方法吗?我尝试更改默认客户编号,但似乎无济于事。

It fails after about 18,000-20,000 queries. Is this the wrong way to use client.query? I tried changing the default client number, but it didn't seem to help.

client.connect()似乎也无济于事,但这是因为我有太多的客户端,所以我绝对认为客户端池化是必经之路。

client.connect() doesn't seem to help either, but that was because I had too many clients, so I definitely think client pooling is the way to go.

感谢您的帮助!

推荐答案

UPDATE

此答案已被本文取代:数据导入,它代表了最新的方法。

This answer has been since superseded with this article: Data Imports, which represents the most up-to-date approach.

为了复制您的方案,我使用了 pg-promise 库,并且我可以肯定,无论您使用哪个库,直接尝试都是行不通的。

In order to replicate your scenario I used pg-promise library, and I can confirm that trying it head-on will never work, no matter which library you use, it is the approach that matters.

下面是一种经过修改的方法,其中我们将插入分区成块,然后在事务中执行每个块,这是负载平衡(也称为节流):

Below is a modified approach where we partition inserts into chunks and then execute each chunk within a transaction, which is load balancing (aka throttling):

function insertRecords(N) {
    return db.tx(function (ctx) {
        var queries = [];
        for (var i = 1; i <= N; i++) {
            queries.push(ctx.none('insert into test(name) values($1)', 'name-' + i));
        }
        return promise.all(queries);
    });
}
function insertAll(idx) {
    if (!idx) {
        idx = 0;
    }
    return insertRecords(100000)
        .then(function () {
            if (idx >= 9) {
                return promise.resolve('SUCCESS');
            } else {
                return insertAll(++idx);
            }
        }, function (reason) {
            return promise.reject(reason);
        });
}
insertAll()
    .then(function (data) {
        console.log(data);
    }, function (reason) {
        console.log(reason);
    })
    .done(function () {
        pgp.end();
    });

这在大约4分钟内产生了1000,000条记录,这大大降低了速度在前3次交易之后。我使用的是Node JS 0.10.38(64位),消耗了大约340MB的内存。这样,我们连续插入100,000条记录,连续10次。

This produced 1000,000 records in about 4 minutes, dramatically slowing after the first 3 transactions. I was using Node JS 0.10.38 (64-bit), which consumed about 340MB of memory. This way we inserted 100,000 records, 10 times in a row.

如果我们这样做,仅这次在100个事务中插入10,000条记录,则添加相同的1,000,000条记录在短短的1m25内,并没有减速,Node JS消耗了大约100MB的内存,这告诉我们对这样的数据进行分区是个好主意。

If we do the same, only this time insert 10,000 records within 100 transactions, the same 1,000,000 records are added in just 1m25s, no slowing down, with Node JS consuming around 100MB of memory, which tells us that partitioning data like this is a very good idea.

无论您使用哪个库,方法都应该相同:

It doesn't matter which library you use, the approach should be the same:


  1. 将插入的内容分区/限制为多个事务;

  2. 在单个事务中保留大约10,000条记录的插入列表;

  3. 在同步链中执行所有事务。

  4. 在每个事务执行COMMIT之后,将连接释放回池中。

  1. Partition/throttle your inserts into multiple transactions;
  2. Keep the list of inserts in a single transaction at around 10,000 records;
  3. Execute all your transactions in a synchronous chain.
  4. Release connection back to the pool after each transaction's COMMIT.

如果您违反了这些规则中的任何一条,则可以肯定会遇到麻烦。例如,如果您违反规则3,则您的Node JS进程可能很快就会真正耗尽内存并引发错误。在我的示例中,规则4由库提供。

If you break any of those rules, you're guaranteed trouble. For example, if you break rule 3, your Node JS process is likely to run out of memory real quick and throw an error. Rule 4 in my example was provided by the library.

如果遵循此模式,则无需为连接池设置烦恼。

And if you follow this pattern, you don't need to trouble yourself with the connection pool settings.

更新1

最新版本的 pg-promise 完全支持这种情况,如下所示:

Later versions of pg-promise support such scenarios perfectly, as shown below:

function factory(index) {
    if (index < 1000000) {
        return this.query('insert into test(name) values($1)', 'name-' + index);
    }
}

db.tx(function () {
    return this.batch([
        this.none('drop table if exists test'),
        this.none('create table test(id serial, name text)'),
        this.sequence(factory), // key method
        this.one('select count(*) from test')
    ]);
})
    .then(function (data) {
        console.log("COUNT:", data[3].count);
    })
    .catch(function (error) {
        console.log("ERROR:", error);
    });

如果您不希望包括表创建之类的额外内容,那么它看起来会更简单:

and if you do not want to include anything extra, like table creation, then it looks even simpler:

function factory(index) {
    if (index < 1000000) {
        return this.query('insert into test(name) values($1)', 'name-' + index);
    }
}

db.tx(function () {
    return this.sequence(factory);
})
    .then(function (data) {
        // success;
    })
    .catch(function (error) {
        // error;
    });

请参见同步事务以获取详细信息。

See Synchronous Transactions for details.

使用 Bluebird 作为Promise库,例如,在我的生产计算机上插入1,000,000条记录需要花费1m43(

Using Bluebird as the promise library, for example, it takes 1m43s on my production machine to insert 1,000,000 records (without long stack traces enabled).

您只需根据<$ c拥有 factory 方法返回请求$ c> index ,直到您一无所有,如此简单。

You would just have your factory method return requests according to the index, till you have none left, simple as that.

最棒的是,这不仅快速,而且也会给您的NodeJS进程带来很小的负担。在整个测试过程中,内存测试过程保持在60MB以下,仅消耗7-8%的CPU时间。

And the best part, this isn't just fast, but also creates little load on your NodeJS process. Memory test process stays under 60MB during the entire test, consuming only 7-8% of the CPU time.

UPDATE 2

从1.7.2版开始, pg-promise 轻松支持超大规模事务。请参阅同步事务

Starting with version 1.7.2, pg-promise supports super-massive transactions with ease. See chapter Synchronous Transactions.

例如,我可以在家用PC上使用Windows 8.1 64位在15分钟内在单个事务中插入10,000,000条记录。

For example, I could insert 10,000,000 records in a single transaction in just 15 minutes on my home PC, with Windows 8.1 64-bit.

测试中,我将PC设置为生产模式,并使用 Bluebird 作为Promise库。在测试期间,整个NodeJS 0.12.5进程(64位)的内存消耗没有超过75MB,而我的i7-4770 CPU却显示出始终如一的15%负载。

For the test I set my PC to production mode, and used Bluebird as the promise library. During the test, memory consumption didn't go over 75MB for the entire NodeJS 0.12.5 process (64-bit), while my i7-4770 CPU showed consistent 15% load.

以相同的方式插入100m条记录只需要更多的耐心,但不需要更多的计算机资源。

Inserting 100m records the same way would require just more patience, but not more computer resources.

同时,以前对1m次插入的测试从1m43s下降到了1m31s。

In the meantime, the previous test for 1m inserts dropped from 1m43s to 1m31s.

更新3

以下注意事项可能会产生很大的不同: 性能提升

The following considerations can make a huge difference: Performance Boost.

更新4

相关问题,带有更好的实现示例:
具有pg-promise的大量插入

Related question, with a better implementation example: Massive inserts with pg-promise.

更新5

可以在此处找到一个更好的更新示例: nodeJS将数据插入PostgreSQL错误

A better and newer example can be found here: nodeJS inserting Data into PostgreSQL error

这篇关于有大量查询的node-postgres的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆