如何从大文件中读取行分隔的JSON(逐行) [英] How to read line-delimited JSON from large file (line by line)

查看:682
本文介绍了如何从大文件中读取行分隔的JSON(逐行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试加载由JSON字符串填充的大文件(大小为2GB),并以换行符分隔.例如:

I'm trying to load a large file (2GB in size) filled with JSON strings, delimited by newlines. Ex:

{
    "key11": value11,
    "key12": value12,
}
{
    "key21": value21,
    "key22": value22,
}
…

我现在导入的方式是:

content = open(file_path, "r").read() 
j_content = json.loads("[" + content.replace("}\n{", "},\n{") + "]")

这似乎很容易(在每个JSON字符串之间加上逗号,并在其开头和结尾的方括号之间添加一个逗号,以使其成为正确的列表).

Which seems like a hack (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list).

是否有更好的方法来指定JSON分隔符(换行符\n而不是逗号,)?

Is there a better way to specify the JSON delimiter (newline \n instead of comma ,)?

此外,Python似乎无法为使用2GB数据构建的对象正确分配内存,当我逐行读取文件时,有没有办法构造每个JSON对象?谢谢!

Also, Python can't seem to properly allocate memory for an object built from 2GB of data, is there a way to construct each JSON object as I'm reading the file line by line? Thanks!

推荐答案

此时只需阅读每一行并构造一个json对象:

Just read each line and construct a json object at this time:

with open(file_path) as f:
    for line in f:
        j_content = json.loads(line)

这样,您可以加载正确的完整json对象(假设json对象中某处或中间的json值中没有\n),并且可以避免在需要时创建每个对象的内存问题.

This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.

也有这个答案.

https://stackoverflow.com/a/7795029/671543

这篇关于如何从大文件中读取行分隔的JSON(逐行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆