Node.JS中的createReadStream [英] createReadStream in Node.JS

查看:64
本文介绍了Node.JS中的createReadStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我使用fs.readFile()并且它给了我

So I used fs.readFile() and it gives me


致命错误:CALL_AND_RETRY_LAST分配失败 - 处理掉
memory

"FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory"

因为fs.readFile()在调用回调之前将整个文件加载到内存中,我是否应该使用fs.createReadStream ()而不是?

since fs.readFile() loads the whole file into memory before calling the callback, should I use fs.createReadStream() instead?

这就是我以前用readFile做的事情:

That's what I was doing previously with readFile:

fs.readFile('myfile.json', function (err1, data) {
    if (err1) {
        console.error(err1);
    } else {
        var myData = JSON.parse(data);
        //Do some operation on myData here
    }
}

对不起,我对流式传输不熟悉;以下是使用流媒体做同样事情的正确方法吗?

Sorry, I'm kind of new to streaming; is the following the right way to do the same thing but with streaming?

var readStream = fs.createReadStream('myfile.json');

readStream.on('end', function () {  
    readStream.close();
    var myData = JSON.parse(readStream);
    //Do some operation on myData here
});

谢谢

推荐答案

如果文件很大,那么是的,流媒体将是你想怎么处理它。但是,你在第二个例子中所做的就是让流缓冲将所有文件数据存入内存,然后在 end 上处理它。它与 readFile 基本上没什么区别。

If the file is enormous then yes, streaming will be how you want to deal with it. However, what you're doing in your second example is letting the stream buffer all the file data into memory and then handling it on end. It's essentially no different than readFile that way.

你想看看 JSONStream 。流式传输意味着您希望在流过时处理数据。在你的情况下,你显然来执行此操作,因为你无法一次性将整个文件缓冲到内存中。考虑到这一点,希望这样的代码是有道理的:

You'll want to check out JSONStream. What streaming means is that you want to deal with the data as it flows by. In your case you obviously have to do this because you cannot buffer the entire file into memory all at once. With that in mind, hopefully code like this makes sense:

JSONStream.parse('rows.*.doc')

请注意,它有一种查询模式。那是因为你不会让文件中的整个JSON对象/数组同时处理所有内容,所以你必须多考虑JSONStream如何处理数据,因为它找到它

Notice that it has a kind of query pattern. That's because you will not have the entire JSON object/array from the file to work with all at once, so you have to think more in terms of how you want JSONStream to deal with the data as it finds it.

您可以使用JSONStream来实质查询您感兴趣的JSON数据。这样您就不会将整个文件缓冲到内存中。它确实有缺点,如果你确实需要所有数据,那么你将不得不多次流式传输文件,使用JSONStream只提取你当时所需的数据,但在你的情况下,你没有很多选择。

You can use JSONStream to essentially query for the JSON data that you are interested in. This way you're never buffering the whole file into memory. It does have the downside that if you do need all the data, then you'll have to stream the file multiple times, using JSONStream to pull out only the data you need right at that moment, but in your case you don't have much choice.

您还可以使用JSONStream按顺序解析数据并执行类似转储到数据库中的操作。

You could also use JSONStream to parse out data in order and do something like dump it into a database.

JSONStream.parse 类似于 JSON.parse 但不是返回整个对象而是返回一个流。当解析流获得足够的数据以形成与查询匹配的整个对象时,它将发出 data 事件,其中数据是与您的查询匹配的文档。一旦配置了数据处理程序,就可以将读取的数据流传输到解析流中并观察魔术的发生。

JSONStream.parse is similar to JSON.parse but instead of returning a whole object it returns a stream. When the parse stream gets enough data to form a whole object matching your query, it will emit a data event with the data being the document that matches your query. Once you've configured your data handler you can pipe your read stream into the parse stream and watch the magic happen.

示例:

var JSONStream = require('JSONStream');
var readStream = fs.createReadStream('myfile.json');
var parseStream = JSONStream.parse('rows.*.doc');
parseStream.on('data', function (doc) {
  db.insert(doc); // pseudo-code for inserting doc into a pretend database.
});
readStream.pipe(parseStream);

这是帮助您了解正在发生的事情的冗长方式。这里有一个更简洁的方法:

That's the verbose way to help you understand what's happening. Here is a more succinct way:

var JSONStream = require('JSONStream');
fs.createReadStream('myfile.json')
  .pipe(JSONStream.parse('rows.*.doc'))
  .on('data', function (doc) {
    db.insert(doc);
  });



编辑:



为了进一步明确发生了什么,试着这样想。假设你有一个巨大的湖泊,你想要用水来净化它并将水移到一个新的水库。如果你有一个巨大的神奇直升机和一个巨大的水桶,那么你可以飞过湖面,将湖泊放入水桶,加入处理化学品,然后飞到目的地。

For further clarity about what's going on, try to think about it like this. Let's say you have a giant lake and you want to treat the water to purify it and move the water to a new reservoir. If you had a giant magical helicopter with a huge bucket then you could fly over the lake, put the lake in the bucket, add treatment chemicals to it, then fly it to its destination.

问题当然是没有这样的直升机可以处理那么多的重量或体积。这根本不可能,但这并不意味着我们不能以不同的方式实现我们的目标。因此,您将在湖泊和新水库之间建造一系列河流(溪流)。然后,您可以在这些河流中设置净化站,净化通过它的任何水。这些站可以以各种方式操作。也许治疗可以做得如此之快,以至于你可以让河流自由流动,当水以最大速度向下流动时,净化就会发生。

The problem of course being that there is no such helicopter that can deal with that much weight or volume. It's simply impossible, but that doesn't mean we can't accomplish our goal a different way. So instead you build a series of rivers (streams) between the lake and the new reservoir. You then set up cleansing stations in these rivers that purify any water that passes through it. These stations could operate in a variety of ways. Maybe the treatment can be done so fast that you can let the river flow freely and the purification will just happen as the water travels down the stream at maximum speed.

它也是可能需要一些时间来处理水,或者该站需要一定量的水才能有效地处理它。所以你设计你的河流有闸门,你控制从湖里流入河流的水流,让水站缓冲他们需要的水,直到他们完成工作,并将净化水释放到下游和最后目的地。

It's also possible that it takes some time for the water to be treated, or that the station needs a certain amount of water before it can effectively treat it. So you design your rivers to have gates and you control the flow of the water from the lake into your rivers, letting the stations buffer just the water they need until they've performed their job and released the purified water downstream and on to its final destination.

这几乎就是你想要处理的数据。解析流是您的清理工作站,它会缓冲数据,直到它足以形成与您的查询匹配的整个文档,然后它将该数据推送到下游(并发出数据事件)。

That's almost exactly what you want to do with your data. The parse stream is your cleansing station and it buffers data until it has enough to form a whole document that matches your query, then it pushes just that data downstream (and emits the data event).

节点流很好,因为大多数时候你不必处理打开和关闭门。当流缓冲一定量的数据时,节点流足够智能以控制回流。这就好像清洁站和湖上的大门正在相互通话以确定完美的流速。

Node streams are nice because most of the time you don't have to deal with opening and closing the gates. Node streams are smart enough to control backflow when the stream buffers a certain amount of data. It's as if the cleansing station and the gates on the lake are talking to each other to work out the perfect flow rate.

如果你有一个流媒体数据库驱动程序那么你' d理论上能够创建某种插入流,然后执行 parseStream.pipe(insertStream)而不是处理数据手动事件:D。以下是在另一个文件中创建JSON文件的过滤版本的示例。

If you had a streaming database driver then you'd theoretically be able to create some kind of insert stream and then do parseStream.pipe(insertStream) instead of handling the data event manually :D. Here's an example of creating a filtered version of your JSON file, in another file.

fs.createReadStream('myfile.json')
  .pipe(JSONStream.parse('rows.*.doc'))
  .pipe(JSONStream.stringify())
  .pipe(fs.createWriteStream('filtered-myfile.json'));

这篇关于Node.JS中的createReadStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆