如何用大数据集填充猫鼬 [英] How to populate mongoose with a large data set

查看:53
本文介绍了如何用大数据集填充猫鼬的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Node.js(0.8.18)和Mongoose(3.5.4)将商店目录加载到MongoDb(2.2.2)中-全部在Windows 7 64bit上.数据集包含大约12,500条记录.每个数据记录都是一个JSON字符串.

I'm attempting to load a store catalog into MongoDb (2.2.2) using Node.js (0.8.18) and Mongoose (3.5.4) -- all on Windows 7 64bit. The data set contains roughly 12,500 records. Each data record is a JSON string.

我最近的尝试是这样的:

My latest attempt looks like this:

var fs = require('fs');
var odir = process.cwd() + '/file_data/output_data/';
var mongoose = require('mongoose');
var Catalog = require('./models').Catalog;

var conn = mongoose.connect('mongodb://127.0.0.1:27017/sc_store');

exports.main = function(callback){
    var catalogArray = fs.readFileSync(odir + 'pc-out.json','utf8').split('\n');
    var i = 0;

    Catalog.remove({}, function(err){
        while(i < catalogArray.length){
            new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
                if(err){
                    console.log(err);
                } else {
                    i++;                    
                }
            });
            if(i === catalogArray.length -1) return callback('database populated');
        }
    });
};

我在尝试填充数据库时遇到很多问题.在以前的场景(以及该场景)下,节点固定处理器并最终耗尽内存.请注意,在这种情况下,我试图让Mongoose保存一条记录,然后在记录保存后迭代到下一条记录.

I have had a lot of problems trying to populate the database. Under previous scenarios (and this one), node pegs the processor and eventually runs out of memory. Note that in this scenario, I'm trying to allow Mongoose to save a record, and then iterate to the next record once the record saves.

但是Mongoose保存功能内部的迭代器永远不会递增.此外,它永远不会引发任何错误.但是,如果我将迭代器(i)放在对Mongoose的异步调用之外,那么只要我尝试加载的记录数量不太大(我已经成功地通过这种方式成功加载了2,000条记录),它就会起作用.

But the iterator inside of the Mongoose save function never gets incremented. In addition, it never throws any errors. But if I put the iterator (i) outside of the asynchronous call to Mongoose, it will work, provided the number of records that I try to load are not too big (I have successfully loaded 2,000 this way).

所以我的问题是:为什么Mongoose保存调用中的迭代器没有增加?而且,更重要的是,使用Mongoose将大型数据集加载到MongoDb的最佳方法是什么?

So my questions are: Why isn't the iterator inside of the Mongoose save call ever incremented? And, more importantly, what is the best way to load a large data set into MongoDb using Mongoose?

Rob

推荐答案

i是您在catalogArray中从中提取输入数据的索引,但是您还试图使用它来跟踪有多少已保存,这是不可能的.尝试像这样分别跟踪它们:

i is your index to where you're pulling input data from in catalogArray, but you're also trying to use it to keep track of how many have been saved which isn't possible. Try tracking them separately like this:

var i = 0;
var saved = 0;
Catalog.remove({}, function(err){
    while(i < catalogArray.length){
        new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
            saved++;
            if(err){
                console.log(err);
            } else {
                if(saved === catalogArray.length) {
                    return callback('database populated');
                }
            }
        });
        i++;
    }
});

更新

如果要向流程添加更严格的流控制,则可以使用async模块的 forEachLimit 函数可将未完成的save操作的数量限制为您指定的任何数量.例如,一次将其限制为一个未完成的save:

If you want to add tighter flow control to the process, you can use the async module's forEachLimit function to limit the number of outstanding save operations to whatever you specify. For example, to limit it to one outstanding save at a time:

Catalog.remove({}, function(err){
    async.forEachLimit(catalogArray, 1, function (catalog, cb) {
        new Catalog(JSON.parse(catalog)).save(function (err, doc) {
            if (err) {
                console.log(err);
            }
            cb(err);
        });
    }, function (err) {
        callback('database populated');
    });
}

这篇关于如何用大数据集填充猫鼬的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆