如何使用“json"模块一次读取一个 JSON 对象? [英] How do I use the 'json' module to read in one JSON object at a time?

查看:22
本文介绍了如何使用“json"模块一次读取一个 JSON 对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个多 GB 的 JSON 文件.该文件由每个不超过几千个字符的 JSON 对象组成,但记录之间没有换行符.

I have a multi-gigabyte JSON file. The file is made up of JSON objects that are no more than a few thousand characters each, but there are no line breaks between the records.

使用 Python 3 和 json 模块,如何一次从文件中读取一个 JSON 对象到内存中?

Using Python 3 and the json module, how can I read one JSON object at a time from the file into memory?

数据在纯文本文件中.下面是一个类似记录的例子.实际记录包含许多嵌套的字典和列表.

The data is in a plain text file. Here is an example of a similar record. The actual records contains many nested dictionaries and lists.

以可读格式记录:

{
    "results": {
      "__metadata": {
        "type": "DataServiceProviderDemo.Address"
      },
      "Street": "NE 228th",
      "City": "Sammamish",
      "State": "WA",
      "ZipCode": "98074",
      "Country": "USA"
    }
  }
}

实际格式.新的记录一个接一个地开始,没有任何中断.

Actual format. New records start one after the other without any breaks.

{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }

推荐答案

一般来说,将多个 JSON 对象放入一个文件会使该文件 无效、损坏的 JSON.也就是说,您仍然可以使用 分块解析数据JSONDecoder.raw_decode() 方法.

Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode() method.

以下将在解析器找到它们时产生完整的对象:

The following will yield complete objects as the parser finds them:

from json import JSONDecoder
from functools import partial


def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
    buffer = ''
    for chunk in iter(partial(fileobj.read, buffersize), ''):
         buffer += chunk
         while buffer:
             try:
                 result, index = decoder.raw_decode(buffer)
                 yield result
                 buffer = buffer[index:].lstrip()
             except ValueError:
                 # Not enough data to decode, read more
                 break

此函数将从 buffersize 块中的给定文件对象中读取块,并让 decoder 对象从缓冲区解析整个 JSON 对象.每个解析过的对象都会交给调用者.

This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.

像这样使用它:

with open('yourfilename', 'r') as infh:
    for data in json_parse(infh):
        # process object

仅当您的 JSON 对象背靠背写入文件时才使用此选项,中间没有换行符.如果您确实有换行符,并且每个 JSON 对象仅限于一行,那么您就有了一个 JSON Lines 文档,在这种情况下,您可以使用 加载和解析 JSONPython 中包含多个 JSON 对象的文件.

Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.

这篇关于如何使用“json"模块一次读取一个 JSON 对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆