如何从 Python 中的文件/流中懒惰地读取多个 JSON 值? [英] How I can I lazily read multiple JSON values from a file/stream in Python?

查看:14
本文介绍了如何从 Python 中的文件/流中懒惰地读取多个 JSON 值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 Python 中的文件/流中读取多个 JSON 对象,一次一个.不幸的是 json.load() 只是 .read() 直到文件结束;似乎没有任何方法可以使用它来读取单个对象或懒惰地迭代对象.

I'd like to read multiple JSON objects from a file/stream in Python, one at a time. Unfortunately json.load() just .read()s until end-of-file; there doesn't seem to be any way to use it to read a single object or to lazily iterate over the objects.

有没有办法做到这一点?使用标准库是理想的,但如果有第三方库,我会改用它.

Is there any way to do this? Using the standard library would be ideal, but if there's a third-party library I'd use that instead.

目前我将每个对象放在单独的行上并使用 json.loads(f.readline()),但我真的不想这样做.>

示例使用

example.py

At the moment I'm putting each object on a separate line and using json.loads(f.readline()), but I would really prefer not to need to do this.

import my_json as json
import sys

for o in json.iterload(sys.stdin):
    print("Working on a", type(o))

in.txt

{"foo": ["bar", "baz"]} 1 2 [] 4 5 6

示例会话

$ python3.2 example.py < in.txt
Working on a dict
Working on a int
Working on a int
Working on a list
Working on a int
Working on a int
Working on a int

推荐答案

这里有一个简单得多的解决方案.秘诀是尝试、失败并使用异常中的信息来正确解析.唯一的限制是文件必须是可查找的.

Here's a much, much simpler solution. The secret is to try, fail, and use the information in the exception to parse correctly. The only limitation is the file must be seekable.

def stream_read_json(fn):
    import json
    start_pos = 0
    with open(fn, 'r') as f:
        while True:
            try:
                obj = json.load(f)
                yield obj
                return
            except json.JSONDecodeError as e:
                f.seek(start_pos)
                json_str = f.read(e.pos)
                obj = json.loads(json_str)
                start_pos += e.pos
                yield obj

刚刚注意到这仅适用于 Python >=3.5.之前,失败返回一个 ValueError,你必须从字符串中解析出位置,例如

just noticed that this will only work for Python >=3.5. For earlier, failures return a ValueError, and you have to parse out the position from the string, e.g.

def stream_read_json(fn):
    import json
    import re
    start_pos = 0
    with open(fn, 'r') as f:
        while True:
            try:
                obj = json.load(f)
                yield obj
                return
            except ValueError as e:
                f.seek(start_pos)
                end_pos = int(re.match('Extra data: line d+ column d+ .*(char (d+).*)',
                                    e.args[0]).groups()[0])
                json_str = f.read(end_pos)
                obj = json.loads(json_str)
                start_pos += end_pos
                yield obj

这篇关于如何从 Python 中的文件/流中懒惰地读取多个 JSON 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆