"JavaScript堆内存不足";流大文件时 [英] "JavaScript heap out of memory" while streaming large file

查看:175
本文介绍了"JavaScript堆内存不足";流大文件时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在服务器上使用XML-> JSON-> MongoDB.我有一个NodeJS应用程序,该应用程序流化XML,将其转换为JSON,然后以1000的块将其添加到MongoDB服务器.但是,经过大约75000条记录后,我的Macbook风扇开始旋转得更快,并且处理过程真的很慢.几分钟后,出现此错误:

I am trying to XML -> JSON -> MongoDB on my server. I have a NodeJS application which streams the XML, converts it into JSON, then adds it to the MongoDB server in chunks of 1000s. However, after about 75000 records, my Macbook's fans starts spinning faster and the processing goes REALLY slow. After a few minutes, I get this error:

< ---后几个GC --->

<--- Last few GCs --->

[30517:0x102801600] 698057 ms:标记扫描1408.2(1702.9)-> 1408.1(1667.4)MB,800.3/0.0 ms(+ 0.0 ms自标记开始以来的0步,最大步幅0.0 ms,自开始以来的时间标记803毫秒)不得已 [30517:0x102801600] 698940毫秒:标记扫描1408.1(1667.4)-> 1408.1(1667.4)MB,882.2/0.0毫秒不得已

[30517:0x102801600] 698057 ms: Mark-sweep 1408.2 (1702.9) -> 1408.1 (1667.4) MB, 800.3 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 803 ms) last resort [30517:0x102801600] 698940 ms: Mark-sweep 1408.1 (1667.4) -> 1408.1 (1667.4) MB, 882.2 / 0.0 ms last resort

最后是JS stacktrace:

and finally in the JS stacktrace:

致命错误:CALL_AND_RETRY_LAST分配失败-JavaScript堆内存不足

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

我感觉我的内存快用完了,但是当文件超过70 GB且我只有16GB的RAM时,用--max-old-space-size(或其他任何方法)增加允许的内存是行不通的.

I have a feeling my memory is running out, but increasing the allowed memory with --max-old-space-size (or whatever) doesn't work when the file is 70+ gigabytes and I only have 16GB of RAM.

这是我要执行的操作的代码:

Here's the code of what I am trying to do:

var fs = require('fs'),
    path = require('path'),
    XmlStream = require('xml-stream'),
    MongoClient = require('mongodb').MongoClient,
    url = 'mongodb://username:password@my.server:27017/mydatabase',
    amount = 0;

MongoClient.connect(url, function(err, db) {

    var stream = fs.createReadStream(path.join(__dirname, 'motor.xml'));
    var xml = new XmlStream(stream);

    var docs = [];
    xml.collect('ns:Statistik');

    // This is your event for the element matches
    xml.on('endElement: ns:Statistik', function(item) {
        docs.push(item);           // collect to array for insertMany
        amount++;

        if ( amount % 1000 === 0 ) { 
          xml.pause();             // pause the stream events
          db.collection('vehicles').insertMany(docs, function(err, result) {
            if (err) throw err;
            docs = [];             // clear the array
            xml.resume();          // resume the stream events
          });
        }
    });

    // End stream handler - insert remaining and close connection
    xml.on("end",function() {
      if ( amount % 1000 !== 0 ) {
        db.collection('vehicles').insertMany(docs, function(err, result) {
          if (err) throw err;
          db.close();
        });
      } else {
        db.close();
      }
    });

});

我的问题是:我有内存泄漏吗?为什么Node允许代码那样建立内存?除了为我的PC购买70 GB以上的RAM外,还有其他解决方法吗?

My question is something like: Do I have a memory leak? Why does Node allow the code to build up the memory like that? Is there a fix besides buying 70+ GB of RAM for my PC?

推荐答案

将我的评论发布为答案,因为它解决了该问题,并且可能对其他难以使用xml-stream软件包的人有用.

Posting my comment as an answer, since it solved the issue and might be useful to others having difficulting using the xml-stream package in this way.

有问题的是,collect方法引起了问题,因为它迫使解析器在解析它们时将其收集为数组中所有已处理节点的实例. collect仅应用于从正在解析的每个节点中收集某种类型的子项.默认行为是不执行此操作(由于解析器的流式传输特性使您可以轻松处理数GB的文件).

In question, the collect method is causing the issue as it is forcing the parser to collect all the instances of the processed node in an array as they are parsed. collect should only be used to collect children items of a certain type from each node that is being parsed. The default behaviour is not to do that (due to the streaming nature of the parser that lets you process multi gigabyte files with ease).

因此解决方案是删除该行代码,而仅使用endElement事件.

So solution was to remove that line of code and just use the endElement event.

这篇关于"JavaScript堆内存不足";流大文件时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆