在Nodejs中解析大型JSON文件并独立处理每个对象 [英] Parse large JSON file in Nodejs and handle each object independently

查看:196
本文介绍了在Nodejs中解析大型JSON文件并独立处理每个对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Nodejs中读取一个大的JSON文件(大约630MB)并将每个对象插入到MongoDB中。

I need to read a large JSON file (around 630MB) in Nodejs and insert each object to MongoDB.

我在这里读到了答案:解析Nodejs中的大型JSON文件

I've read the answer here:Parse large JSON file in Nodejs.

然而,答案是逐行处理JSON文件,而不是逐个对象地处理它。因此,我仍然不知道如何从该文件中获取对象并进行操作。

However, answers there are handling the JSON file line by line, instead of handling it object by object. Thus, I still don't know how to get an object from this file and operate it.

我的JSON文件中有大约100,000个这类对象。

I have about 100,000 this kind of objects in my JSON file.

数据格式:

[
  {
    "id": "0000000",
    "name": "Donna Blak",
    "livingSuburb": "Tingalpa",
    "age": 53,
    "nearestHospital": "Royal Children's Hospital",
    "treatments": {
        "19890803": {
            "medicine": "Stomach flu B",
            "disease": "Stomach flu"
        },
        "19740112": {
            "medicine": "Progeria C",
            "disease": "Progeria"
        },
        "19830206": {
            "medicine": "Poliomyelitis B",
            "disease": "Poliomyelitis"
        }
    },
    "class": "patient"
  },
 ...
]

干杯,

Alex

推荐答案

re是一个很好的模块,名为'stream-json',可以完全满足您的需求。

There is a nice module named 'stream-json' that does exactly what you want.


它可以解析远远超过可用内存的JSON文件。

It can parse JSON files far exceeding available memory.


StreamArray处理一个常见的用例:类似于Django生成的数据库转储的大量相对较小的对象。它会单独流式传输阵列组件,并自动组装它们。

StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.

这是一个非常基本的例子:

Here is a very basic example:

const StreamArray = require('stream-json/streamers/StreamArray');
const path = require('path');
const fs = require('fs');

const jsonStream = StreamArray.withParser();

//You'll get json objects here
//Key is an array-index here
jsonStream.on('data', ({key, value}) => {
    console.log(key, value);
});

jsonStream.on('end', () => {
    console.log('All done');
});

const filename = path.join(__dirname, 'sample.json');
fs.createReadStream(filename).pipe(jsonStream.input);

如果您想做更复杂的事情,例如按顺序处理一个对象(保持顺序)并为每个对象应用一些异步操作然后你可以像这样执行自定义可写流:

If you'd like to do something more complex e.g. process one object after another sequentially (keeping the order) and apply some async operations for each of them then you could do the custom Writeable stream like this:

const StreamArray = require('stream-json/streamers/StreamArray');
const {Writable} = require('stream');
const path = require('path');
const fs = require('fs');

const fileStream = fs.createReadStream(path.join(__dirname, 'sample.json'));
const jsonStream = StreamArray.withParser();

const processingStream = new Writable({
    write({key, value}, encoding, callback) {
        //Save to mongo or do any other async actions

        setTimeout(() => {
            console.log(value);
            //Next record will be read only current one is fully processed
            callback();
        }, 1000);
    },
    //Don't skip this, as we need to operate with objects, not buffers
    objectMode: true
});

//Pipe the streams as follows
fileStream.pipe(jsonStream.input);
jsonStream.pipe(processingStream);

//So we're waiting for the 'finish' event when everything is done.
processingStream.on('finish', () => console.log('All done'));

请注意:以上示例均经过'stream-json@1.1.3'测试。对于某些以前的版本(大概是1.0.0),您可能需要:

Please note: The examples above are tested for 'stream-json@1.1.3'. For some previous versions (presumably proior to 1.0.0) you might have to:

const StreamArray = require('stream-json / utils / StreamArray');

然后

const jsonStream = StreamArray.make();

这篇关于在Nodejs中解析大型JSON文件并独立处理每个对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆