如何从大文件中读取行分隔的JSON(逐行) [英] How to read line-delimited JSON from large file (line by line)
问题描述
我正在尝试加载由JSON字符串填充的大文件(大小为2GB),并以换行符分隔.例如:
I'm trying to load a large file (2GB in size) filled with JSON strings, delimited by newlines. Ex:
{
"key11": value11,
"key12": value12,
}
{
"key21": value21,
"key22": value22,
}
…
我现在导入的方式是:
content = open(file_path, "r").read()
j_content = json.loads("[" + content.replace("}\n{", "},\n{") + "]")
这似乎很容易(在每个JSON字符串之间加上逗号,并在其开头和结尾的方括号之间添加一个逗号,以使其成为正确的列表).
Which seems like a hack (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list).
是否有更好的方法来指定JSON分隔符(换行符\n
而不是逗号,
)?
Is there a better way to specify the JSON delimiter (newline \n
instead of comma ,
)?
此外,Python
似乎无法为使用2GB数据构建的对象正确分配内存,当我逐行读取文件时,有没有办法构造每个JSON
对象?谢谢!
Also, Python
can't seem to properly allocate memory for an object built from 2GB of data, is there a way to construct each JSON
object as I'm reading the file line by line? Thanks!
推荐答案
此时只需阅读每一行并构造一个json对象:
Just read each line and construct a json object at this time:
with open(file_path) as f:
for line in f:
j_content = json.loads(line)
这样,您可以加载正确的完整json对象(假设json对象中某处或中间的json值中没有\n
),并且可以避免在需要时创建每个对象的内存问题.
This way, you load proper complete json object (provided there is no \n
in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.
也有这个答案.
https://stackoverflow.com/a/7795029/671543
这篇关于如何从大文件中读取行分隔的JSON(逐行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!