从大型json文件中使用python加载元素 [英] Load an element with python from large json file

查看:183
本文介绍了从大型json文件中使用python加载元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,这是我的json文件.我要从中加载数据列表,并且仅加载其中. 然后,举例来说...

So, here is my json file. I want to load the data list from it, one by one, and only it. And then, for exemple plot it...

这是一个例子,因为我正在处理大型数据集,因此无法加载所有文件(这会造成内存错误).

{
  "earth": {
    "europe": [
      {"name": "Paris", "type": "city"},
      {"name": "Thames", "type": "river"}, 
      {"par": 2, "data": [1,7,4,7,5,7,7,6]}, 
      {"par": 2, "data": [1,0,4,1,5,1,1,1]}, 
      {"par": 2, "data": [1,0,0,0,5,0,0,0]}
        ],
    "america": [
      {"name": "Texas", "type": "state"}
    ]
  }
}

这是我尝试过的:

import ijson
filename = "testfile.json"

f = open(filename)
mylist = ijson.items(f, 'earth.europe[2].data.item')
print mylist

即使我尝试将其转换为列表,它也不会返回任何内容:

It returns me nothing, even when I try to convert it into a list:

[]

推荐答案

您需要指定一个有效的前缀; ijson前缀可以是字典中的,也可以是列表条目中的单词item.您无法选择特定的列表项(因此[2]不起作用).

You need to specify a valid prefix; ijson prefixes are either keys in a dictionary or the word item for list entries. You can't select a specific list item (so [2] doesn't work).

如果要europe列表中的所有data键字典,则前缀为:

If you wanted all the data keys dictionaries in the europe list, then the prefix is:

earth.europe.item.data
# ^ ------------------- outermost key must be 'earth'
#       ^ ------------- next key must be 'europe'
#              ^ ------ any value in the array
#                   ^   the value for the 'data' key

这将产生每个这样的列表:

This produces each such list:

>>> l = ijson.items(f, 'earth.europe.item.data')
>>> for data in l:
...     print data
...
[1, 7, 4, 7, 5, 7, 7, 6]
[1, 0, 4, 1, 5, 1, 1, 1]
[1, 0, 0, 0, 5, 0, 0, 0]

您不能在其中放置通配符,因此就不能获得earth.*.item.data.

You can't put wildcards in that, so you can't get earth.*.item.data for example.

如果需要进行更复杂的前缀匹配,则必须使用ijson.parse()函数并处理由此产生的事件.您可以重用ijson.ObjectBuilder()类将感兴趣的事件转换为Python对象:

If you need to do more complex prefixing matching, you'd have to use the ijson.parse() function and handle the events this produces. You can reuse the ijson.ObjectBuilder() class to turn events you are interested in into Python objects:

parser = ijson.parse(f)
for prefix, event, value in parser:
    if event != 'start_array':
        continue
    if prefix.startswith('earth.') and prefix.endswith('.item.data'):
        continent = prefix.split('.', 2)[1]
        builder = ijson.ObjectBuilder()
        builder.event(event, value)
        for nprefix, event, value in parser:
            if (nprefix, event) == (prefix, 'end_array'):
                break
            builder.event(event, value)
        data = builder.value
        print continent, data

这将使用'earth'键在'data'键下打印列表中的每个数组(因此位于以'.item.data'结尾的前缀下).它还提取了大陆密钥.

This will print every array that's in a list under a 'data' key (so lives under a prefix that ends with '.item.data'), with the 'earth' key. It also extracts the continent key.

这篇关于从大型json文件中使用python加载元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆