为什么尝试写入大文件会导致js堆内存不足 [英] why does attempting to write a large file cause js heap to run out of memory

查看:157
本文介绍了为什么尝试写入大文件会导致js堆内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此代码

const file = require("fs").createWriteStream("./test.dat");
for(var i = 0; i < 1e7; i++){

    file.write("a");
}

运行约30秒后给出此错误消息

gives this error message after running for about 30 seconds

<--- Last few GCs --->

[47234:0x103001400]    27539 ms: Mark-sweep 1406.1 (1458.4) -> 1406.1 (1458.4) MB, 2641.4 / 0.0 ms  allocation failure GC in old space requested
[47234:0x103001400]    29526 ms: Mark-sweep 1406.1 (1458.4) -> 1406.1 (1438.9) MB, 1986.8 / 0.0 ms  last resort GC in old spacerequested
[47234:0x103001400]    32154 ms: Mark-sweep 1406.1 (1438.9) -> 1406.1 (1438.9) MB, 2628.3 / 0.0 ms  last resort GC in old spacerequested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x30f4a8e25ee1 <JSObject>
    1: /* anonymous */ [/Users/matthewschupack/dev/streamTests/1/write.js:~1] [pc=0x270efe213894](this=0x30f4e07ed2f1 <Object map = 0x30f4ede823b9>,exports=0x30f4e07ed2f1 <Object map = 0x30f4ede823b9>,require=0x30f4e07ed2a9 <JSFunction require (sfi = 0x30f493b410f1)>,module=0x30f4e07ed221 <Module map = 0x30f4edec1601>,__filename=0x30f493b47221 <String[49]: /Users/matthewschupack/dev/streamTests/...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [/usr/local/bin/node]
 2: node::FatalException(v8::Isolate*, v8::Local<v8::Value>, v8::Local<v8::Message>) [/usr/local/bin/node]
 3: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/local/bin/node]
 4: v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
 5: v8::internal::Runtime_AllocateInTargetSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
 6: 0x270efe08463d
 7: 0x270efe213894
 8: 0x270efe174048
[1]    47234 abort      node write.js

此代码

const file = require("fs").createWriteStream("./test.dat");
for(var i = 0; i < 1e6; i++){

    file.write("aaaaaaaaaa");//ten a's
}

几乎可以立即完美运行,并产生10MB的文件.据我了解,流的要点是两个版本应该在大约相同的时间内运行,因为数据是相同的.即使每次迭代将a的数量增加到100或1000,也几乎不会增加运行时间,并且写入1GB文件没有任何问题.在1e6次迭代中每次迭代写入一个字符也可以正常工作.

runs perfectly almost instantly and produces a 10MB file. As I understood it, the point of streams is that both versions should run in about the same amount of time since the data is identical. Even increasing the number of as to 100 or 1000 per iteration hardly increases the running time even and writes a 1GB file without any issues. Writing a single character per iteration at 1e6 iterations also works fine.

这是怎么回事?

推荐答案

发生内存不足错误是因为您没有等待drain事件的发出,而没有等待Node.js将缓冲所有写入的块,直到出现最大的内存使用情况.

The out of memory error happens because you're not waiting for the drain event to be emitted, without waiting Node.js will buffer all written chunks until maximum memory usage occurs.

.write将返回false.在您的代码中,您没有处理.write的返回值,因此永远不会刷新缓冲区.

.write will return false if the internal buffer is greater than highWaterMark which defaults to 16384 bytes (16kb). In your code, you're not handling the return value of .write, and so the buffer is never flushed.

使用以下命令可以很容易地进行测试:tail -f test.dat

This can be tested very easily using: tail -f test.dat

在执行脚本时,您会看到在脚本完成之前在test.dat上什么都没写.

When executing your script, you will see that nothing is being written on test.dat until the script finishes.

对于1e7,缓冲区应清除610次.

For 1e7 the buffer should be cleared 610 times.

1e7 / 16384 = 610


一种解决方案是检查.write返回值,如果返回了false,则使用包装在Promise中的file.once('drain')等待直到发出drain事件


A solution is to check for .write return value and if false is returned, use file.once('drain') wrapped in a promise to wait until drain event is emitted

注意: writable.writableHighWaterMark已在节点v9.3.0中添加

NOTE: writable.writableHighWaterMark was added in node v9.3.0

const file = require("fs").createWriteStream("./test.dat");

(async() => {

    for(let i = 0; i < 1e7; i++) {
        if(!file.write('a')) {
            // Will pause every 16384 iterations until `drain` is emitted
            await new Promise(resolve => file.once('drain', resolve));
        }
    }
})();

现在,如果执行tail -f test.dat,您将看到在脚本仍在运行时如何写入数据.

Now if you dotail -f test.dat you will see how data is being written while the script is still running.

关于为什么1e7而不是1e6出现内存问题的原因,我们必须研究一下Node.Js如何进行缓冲,这发生在

As of why you get memory issues with 1e7 and not 1e6 we have to take a look into how Node.Js does the buffering, that happen at the writeOrBuffer function.

此示例代码将使我们能够粗略估计内存使用情况:

This sample code will allow us to have a rough estimate of the memory usage:

const count = Number(process.argv[2]) || 1e6;
const state = {};

function nop() {}

const buffer = (data) => {
    const last = state.lastBufferedRequest;
    state.lastBufferedRequest = {
      chunk: Buffer.from(data),
      encoding: 'buffer',
      isBuf: true,
      callback: nop,
      next: null
    };

    if(last)
      last.next = state.lastBufferedRequest;
    else
      state.bufferedRequest = state.lastBufferedRequest;

    state.bufferedRequestCount += 1;
}

const start = process.memoryUsage().heapUsed;
for(let i = 0; i < count; i++) {
    buffer('a');
}
const used = (process.memoryUsage().heapUsed - start) / 1024 / 1024;
console.log(`${Math.round(used * 100) / 100} MB`);

执行时:

// node memory.js <count>
1e4: 1.98 MB
1e5: 16.75 MB
1e6: 160 MB
5e6: 801.74 MB
8e6: 1282.22 MB
9e6: 1442.22 MB - Out of memory
1e7: 1602.97 MB - Out of memory

因此,每个对象都使用~0.16 kb,并且在不等待drain事件的情况下执行1e7 writes时,内存中就有1000万个这些对象(公平地说,它在达到10M之前就崩溃了)

So each object uses ~0.16 kb, and when doing 1e7 writes without waiting for drain event, you have 10 million of those objects in memory (To be fair it crashes before reaching 10M)

使用单个a或1000都没关系,从中增加的内存可以忽略不计.

It doesn't matter if you use a single a or 1000, the memory increase from that is negligible.

您可以使用--max_old_space_size={MB}标志来增加节点使用的最大内存(当然这不是解决方案,仅用于检查内存消耗而不会导致脚本崩溃):

node --max_old_space_size=4096 memory.js 1e7

更新:我在内存片段中犯了一个错误,导致内存使用量增加了30%.我正在为每个.write创建一个新的回调,Node重用了nop回调.

UPDATE: I made a mistake on the memory snippet which led to a 30% increase on memory usage. I was creating a new callback for every .write, Node reuses nop callback.

UPDATE II

如果您始终写入相同的值(在实际情况下是可疑的),则可以极大地减少内存使用量&通过每次传递相同的缓冲区来执行时间:

If you're writing always the same value (doubtful in a real scenario), you can reduce greatly the memory usage & execution time by passing the same buffer every time:

const buf = Buffer.from('a');
for(let i = 0; i < 1e7; i++) {
    if(!file.write(buf)) {
        // Will pause every 16384 iterations until `drain` is emitted
        await new Promise(resolve => file.once('drain', resolve));
    }
}

这篇关于为什么尝试写入大文件会导致js堆内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆