处理JSON对象太大,不适合内存 [英] Dealing with a JSON object too big to fit into memory

查看:942
本文介绍了处理JSON对象太大,不适合内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Firebase数据库的转储,代表了存储在JSON中的Users表。我想对其进行一些数据分析,但问题在于,它太大而无法完全加载到内存中,并且使用纯JavaScript(或 _ 和类似的库)进行操作。 p>

到目前为止,我一直在使用 JSONStream 包处理我的数据在一小块(它调用JSON转储每个用户一次回调)。

我现在遇到了一个障碍,但因为我想根据它们的值来过滤我的用户id。我试图回答的问题的形式是哪些用户x,而之前我只是问有多少用户x,而不需要知道他们是谁。



数据格式如下:

  {
users:{
123:{
foo:4
},
567:{
foo:8
}
}
}

我想要做的就是获得用户ID( 123 567 在上面)基于 foo 的值。现在,如果这是一个小列表,使用 _。each 来遍历键和值并提取我想要的键将是微不足道的。



不幸的是,因为它不适合内存不起作用。使用JSONStream,我可以通过使用 var parser = JSONStream.parse('users。*'); 来遍历它,然后将它管道化为一个函数,如下所示: / p>

  var stream = fs.createReadStream('my.json'); 

stream.pipe(parser);
$ b parser.on('data',function(user){
// user在这里等于{foo:bar}
//所以做我的微不足道过滤
//但是我不知道哪个用户ID拥有数据
});

但问题是我没有访问代表我通过星形通配符的键到 JSONStream.parse 。换句话说,我不知道是否 {foo:bar} 表示用户 123 或用户 567



问题有两方面:



  1. 有没有更好的方法来处理这个太大的JSON数据适合内存吗?

    解决方案

    我继续编辑JSONStream来添加这个功能。 p>

    如果有人遇到这个问题,并且希望以类似的方式修补它,可以替换第83行,这是


    $ b $

      stream.queue(this.value [this.key])

    用这个:

      var ret = {}; 
    ret [this.key] = this.value [this.key];

    stream.queue(ret);

    在原始问题的代码示例中,而不是 user 在回调中等于 {foo:bar} ,现在就是 {uid:{foo:bar}}

    由于这是一个突破性的变化,我没有提交一个拉请求回到原来的项目,但我没有把它留在这些问题,以防万一他们想为将来添加一个标志或选项。


    I have a dump of a Firebase database representing our Users table stored in JSON. I want to run some data analysis on it but the issue is that it's too big to load into memory completely and manipulate with pure JavaScript (or _ and similar libraries).

    Up until now I've been using the JSONStream package to deal with my data in bite-sized chunks (it calls a callback once for each user in the JSON dump).

    I've now hit a roadblock though because I want to filter my user ids based on their value. The "questions" I'm trying to answer are of the form "Which users x" whereas previously I was just asking "How many users x" and didn't need to know who they were.

    The data format is like this:

    {
        users: {
            123: {
                foo: 4
            },
            567: {
                foo: 8
            }
        }
    }
    

    What I want to do is essentially get the user ID (123 or 567 in the above) based on the value of foo. Now, if this were a small list it would be trivial to use something like _.each to iterate over the keys and values and extract the keys I want.

    Unfortunately, since it doesn't fit into memory that doesn't work. With JSONStream I can iterate over it by using var parser = JSONStream.parse('users.*'); and piping it into a function that deals with it like this:

    var stream = fs.createReadStream('my.json');
    
    stream.pipe(parser);
    
    parser.on('data', function(user) {
        // user is equal to { foo: bar } here
        // so it is trivial to do my filter
        // but I don't know which user ID owns the data
    });
    

    But the problem is that I don't have access to the key representing the star wildcard that I passed into JSONStream.parse. In other words, I don't know if { foo: bar} represents user 123 or user 567.

    The question is twofold:

    1. How can I get the current path from within my callback?
    2. Is there a better way to be dealing with this JSON data that is too big to fit into memory?

    解决方案

    I went ahead and edited JSONStream to add this functionality.

    If anyone runs across this and wants to patch it similarly, you can replace line 83 which was previously

    stream.queue(this.value[this.key])
    

    with this:

    var ret = {};
    ret[this.key] = this.value[this.key];
    
    stream.queue(ret);
    

    In the code sample from the original question, rather than user being equal to { foo: bar } in the callback it will now be { uid: { foo: bar } }

    Since this is a breaking change I didn't submit a pull request back to the original project but I did leave it in the issues in case they want to add a flag or option for this in the future.

    这篇关于处理JSON对象太大,不适合内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆