nodejs耗尽内存处理csv文件 [英] nodejs running out of memory processing csv files

查看:158
本文介绍了nodejs耗尽内存处理csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经读过很多关于nodejs内存不足的问题,但我没有看到任何听起来与我的情况相似的内容。

I've read through a number of SO questions about nodejs running out of memory, but I haven't seen anything that sounds similar to my situation.

I试图在250个csv文件中处理大约20GB的数据(所以~80MB /文件)。使用节点v5.9.1在具有90GB可用内存的服务器上使用 - max-old-space-size = 8192 启动节点脚本。经过9分钟的处理后,脚本退出并出现内存不足错误。

I'm trying to process about 20GBs of data across 250 csv files (so ~80MBs/file). Launch the node script with --max-old-space-size=8192 on a server with 90GB of free memory using node v5.9.1. After 9mins of processing the script quits with an out-of-memory error.

我是Node编程的新手,但我以为我编写了脚本来处理数据一次排队,不要记住任何东西。然而似乎某些对象引用被某些东西所持有,因此脚本正在泄漏内存。这是完整的脚本:

I'm new to Node programming, but I thought I wrote the script to process data one line at a time and not to keep anything in memory. Yet it seems some object references are being held on to by something, so the script is leaking memory. Here's the full script:

var fs = require('fs');
var readline = require('readline');
var mongoose = require('mongoose');

mongoose.connect('mongodb://buzzard/xtra');
var db = mongoose.connection;
db.on('error', console.error.bind(console, 'connection error:'));

var DeviceSchema = mongoose.Schema({
    _id: String,
    serial: String
});

var Device = mongoose.model('Device', DeviceSchema, 'devices');

function processLine(line) {
    var serial = line.split(',')[8];

    Device({
        _id: serial,
        serial: serial
    }).save(function (err) {
        if (err) return console.error(err);
    });
}

function processFile(baseDir, fileName) {
    if(!fileName.startsWith('qcx3'))
        return;

    var fullPath = `${baseDir}/${fileName}`;

    var lineReader = readline.createInterface({
      input: fs.createReadStream(fullPath)
    });

    lineReader.on('line', processLine);
}

function findFiles(rootDir) {
  fs.readdir(rootDir, function (error, files) {
    if (error) {
        console.log(`Error: ${error}` );
        return
    }

    files.forEach(function (file) {
        if(file.startsWith('.'))
            return;

        var fullPath = `${rootDir}/${file}`;

        fs.stat(fullPath, function(error, stat) {
            if (error) {
                console.log(`Error: ${error}` );
                return;
            }

            if(stat.isDirectory())
                dir(fullPath);
            else
                processFile(rootDir, file);
        });
    });
  })
}  


findFiles('c://temp/logs/compress');

我还注意到,当我在一个小得多的测试集上运行脚本时,它可以完全完成处理,脚本最后不会退出。只是一直挂在那里,直到我ctrl + c它。这可能与某种程度有关吗?

I also noticed that when I run the script on a much smaller test set that it can completely finish processing, the script doesn't exit at the end. Just keeps hanging there until I ctrl+c it. Could this be somehow related?

我做错了什么?

推荐答案


  1. 脚本没有退出,因为你有一个与mongoose的开放连接,在处理完所有文件后你应该关闭连接并且脚本将完成。

  2. 你有正确的使用流的想法,但我认为你错过了一些东西,我建议你下面的文章来更新streamInterface和事件。 https://coderwall.com/p/ohjerg/read-large -text-files-in-nodejs

另一个问题来源可能是mongodb,看来你做了很多插入,它可能与耗尽记忆的mongodb的最大i / o有关。

An other source of problem could be the mongodb, it seems you make a lot of inserts, it could be related with the max i/o of mongodb that exhaust the memory.

这篇关于nodejs耗尽内存处理csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆