NodeJS 通过流复制文件非常慢 [英] NodeJS Copying File over a stream is very slow

查看:66
本文介绍了NodeJS 通过流复制文件非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在VMWare下的SSD上用Node拷贝文件,但是性能很低.我用来测量实际速度的基准测试如下:

I am copying file with Node on an SSD under VMWare, but the performance is very low. The benchmark I have run to measure actual speed is as follows:

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   12004 MB in  1.99 seconds = 6025.64 MB/sec
 Timing buffered disk reads: 1370 MB in  3.00 seconds = 456.29 MB/sec

但是,以下复制文件的 Node 代码非常慢,即使随后运行也不会使其更快:

However, the following Node code that copies file is very slow, evne teh consequent runs do not make it faster:

var fs  = require("fs");
fs.createReadStream("bigfile").pipe(fs.createWriteStream("tempbigfile"));

运行如下:

$ seq 1 10000000 > bigfile
$ ll bigfile -h
-rw-rw-r-- 1 mustafa mustafa 848M Jun  3 03:30 bigfile
$ time node test.js 

real    0m4.973s
user    0m2.621s
sys     0m7.236s
$ time node test.js 

real    0m5.370s
user    0m2.496s
sys     0m7.190s

这里有什么问题,我该如何加快速度?我相信我可以通过调整缓冲区大小在 C 中更快地编写它.令我困惑的是,当我编写简单的几乎 pv 等效程序时,将标准输入管道传输到标准输出,如下所示,速度非常快.

What is the issue here and how can I speed it up? I believe I can write it faster in C by just adjusting the buffer size. The thing that confuses me is that when I wrote simple almost pv equivalent program, that pipes stdin to stdout as the below, it is very fast.

process.stdin.pipe(process.stdout);

运行如下:

$ dd if=/dev/zero bs=8M count=128 | pv | dd of=/dev/null
128+0 records in 174MB/s] [        <=>                                                                                ]
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.78077 s, 186 MB/s
   1GB 0:00:05 [ 177MB/s] [          <=>                                                                              ]
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.78131 s, 186 MB/s
$ dd if=/dev/zero bs=8M count=128 |  dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.57005 s, 193 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.5704 s, 193 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.61734 s, 233 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.62766 s, 232 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.22107 s, 254 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.23231 s, 254 MB/s
$ dd if=/dev/zero bs=8M count=128 | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.70124 s, 188 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.70144 s, 188 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.51055 s, 238 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.52087 s, 238 MB/s

推荐答案

我不知道您问题的答案,但也许这有助于您调查问题.

I don't know the answer to your question, but perhaps this helps in your investigation of the problem.

在 Node.js 文档中关于流缓冲,它说:

In the Node.js documentation about stream buffering, it says:

可写Readable 流将数据存储在内部可以使用 writable.writableBuffer 检索的缓冲区或分别为 readable.readableBuffer.

Both Writable and Readable streams will store data in an internal buffer that can be retrieved using writable.writableBuffer or readable.readableBuffer, respectively.

可能缓冲的数据量取决于highWaterMark选项传递给流的构造函数.对于普通流,highWaterMark 选项指定总字节数.对于流在对象模式下运行,highWaterMark 指定一个总数对象....

The amount of data potentially buffered depends on the highWaterMark option passed into the stream's constructor. For normal streams, the highWaterMark option specifies a total number of bytes. For streams operating in object mode, the highWaterMark specifies a total number of objects....

stream API 的一个关键目标,尤其是 stream.pipe() 方法,是将数据缓冲限制在可接受的水平,以便不同速度的来源和目的地不会压倒可用内存.

A key goal of the stream API, particularly the stream.pipe() method, is to limit the buffering of data to acceptable levels such that sources and destinations of differing speeds will not overwhelm the available memory.

因此,您可以调整缓冲区大小以提高速度:

So, you can play with the buffer sizes to improve speed:

var fs = require('fs');
var path = require('path');
var from = path.normalize(process.argv[2]);
var to = path.normalize(process.argv[3]);

var readOpts = {highWaterMark: Math.pow(2,16)};  // 65536
var writeOpts = {highWaterMark: Math.pow(2,16)}; // 65536  

var source = fs.createReadStream(from, readOpts);
var destiny = fs.createWriteStream(to, writeOpts)

source.pipe(destiny);

这篇关于NodeJS 通过流复制文件非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆