使用Python ijson逐步阅读顶级JSON字典 [英] Read top-level JSON dictionary incrementally using Python ijson

查看:164
本文介绍了使用Python ijson逐步阅读顶级JSON字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的JSON文件中包含以下数据:

I have the following data in my JSON file:

{
    "first": {
        "name": "James",
        "age": 30
    },
    "second": {
        "name": "Max",
        "age": 30
    },
    "third": {
        "name": "Norah",
        "age": 30
    },
    "fourth": {
        "name": "Sam",
        "age": 30
    }
}

我要按如下所示打印顶级键和对象:

I want to print the top-level key and object as follows:

import json
import ijson

fname = "data.json"

with open(fname) as f:
    raw_data = f.read()

data = json.loads(raw_data)

for k in data.keys():
    print k, data[k]

输出:

second {u'age': 30, u'name': u'Max'}
fourth {u'age': 30, u'name': u'Sam'}
third {u'age': 30, u'name': u'Norah'}
first {u'age': 30, u'name': u'James'}

那么好.但是,如果我想对一个巨大的文件进行同样的操作,则必须在内存中全部读取.这非常慢,需要大量内存.

So, far so good. However if I want to this same thing for a huge file, I would have to read it all in-memory. This very slow and requires lots of memory.

我想使用增量JSON解析器(在本例中为ijson)来实现我之前所述的内容:

I want use an incremental JSON parser ( ijson in this case ) to achieve what I described earlier:

以上代码摘自:无法访问顶级元素使用ijson吗?

with open(fname) as f:
    json_obj = ijson.items(f,'').next()  # '' loads everything as only one object.
    for (key, value) in json_obj.items():
        print key + " -> " + str(value)    

这也不适合,因为它还会读取内存中的整个文件.这不是真正的增量.

This is not suitable either, because it also reads the whole file in memory. This not truly incremental.

如何在Python中对JSON文件的顶级密钥和相应对象进行增量解析?

How can I do incremental parsing of top-level keys and corresponding objects, of a JSON file in Python?

推荐答案

由于json文件本质上是文本文件,因此请考虑将顶层剥离为字符串.基本上,使用读取文件可迭代方法在这里,您将每行连接一个字符串,然后在该字符串包含双括号}}表示最高层末尾时退出循环.当然,双括号条件必须去除空格和换行符.

Since essentially json files are text files, consider stripping the top level as string. Basically, use a read file iterable approach where you concatenate a string with each line and then break out of the loop once the string contains the double braces }} signaling the end of the top level. Of course the double brace condition must strip out spaces and line breaks.

toplevelstring = ''

with open('data.json') as f:    
    for line in f:
        if not '}}' in toplevelstring.replace('\n', '').replace('\s+',''):
            toplevelstring = toplevelstring + line
        else:
            break

data = json.loads(toplevelstring)

现在,如果较大的json用方括号或其他花括号括起来,则仍在例程上方运行,但添加以下行以切出第一个字符[,以及在顶层最后一个大括号后的逗号和换行符的后两个字符:

Now if your larger json is wrapped in square brackets or other braces, still run above routine but add the below line to slice out first character, [, and last two characters for comma and line break after top level's final brace:

[{
    "first": {
        "name": "James",
        "age": 30
    },
    "second": {
        "name": "Max",
        "age": 30
    },
    "third": {
        "name": "Norah",
        "age": 30
    },
    "fourth": {
        "name": "Sam",
        "age": 30
    }
},
{
    "data1": {
        "id": "AAA",
        "type": 55
    },
    "data2": {
        "id": "BBB",
        "type": 1601
    },
    "data3": {
        "id": "CCC",
        "type": 817
    }
}]

...

toplevelstring = toplevelstring[1:-2]
data = json.loads(toplevelstring)

这篇关于使用Python ijson逐步阅读顶级JSON字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆