加载和解析具有多个 JSON 对象的 JSON 文件 [英] Loading and parsing a JSON file with multiple JSON objects

查看:41
本文介绍了加载和解析具有多个 JSON 对象的 JSON 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 Python 中加载和解析 JSON 文件.但我在尝试加载文件时遇到困难:

I am trying to load and parse a JSON file in Python. But I'm stuck trying to load the file:

import json
json_data = open('file')
data = json.load(json_data)

产量:

ValueError: Extra data: line 2 column 1 - line 225116 column 1 (char 232 - 160128774)

我查看了 18.2.json — Python 文档中的 JSON 编码器和解码器,但阅读这个看起来很糟糕的文档非常令人沮丧.

I looked at 18.2. json — JSON encoder and decoder in the Python documentation, but it's pretty discouraging to read through this horrible-looking documentation.

前几行(使用随机条目匿名):

First few lines (anonymized with randomized entries):

{"votes": {"funny": 2, "useful": 5, "cool": 1}, "user_id": "harveydennis", "name": "Jasmine Graham", "url": "http://example.org/user_details?userid=harveydennis", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 2, "cool": 4}, "user_id": "njohnson", "name": "Zachary Ballard", "url": "https://www.example.com/user_details?userid=njohnson", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 0, "cool": 4}, "user_id": "david06", "name": "Jonathan George", "url": "https://example.com/user_details?userid=david06", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 6, "useful": 5, "cool": 0}, "user_id": "santiagoerika", "name": "Amanda Taylor", "url": "https://www.example.com/user_details?userid=santiagoerika", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 8, "cool": 2}, "user_id": "rodriguezdennis", "name": "Jennifer Roach", "url": "http://www.example.com/user_details?userid=rodriguezdennis", "average_stars": 3.5, "review_count": 12, "type": "user"}

推荐答案

您有一个 JSON Lines 格式文本文件.您需要逐行解析您的文件:

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

每条都包含有效的 JSON,但作为一个整体,它不是有效的 JSON 值,因为没有顶级列表或对象定义.

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

请注意,由于该文件每行包含 JSON,因此您无需费力地尝试一次性解析所有内容或找出流式 JSON 解析器.您现在可以选择在继续处理下一行之前分别处理每一行,从而节省处理过程中的内存.如果您的文件非常大,您可能不想将每个结果附加到一个列表中,然后然后处理所有内容.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.

如果您的文件包含中间带有分隔符的单个 JSON 对象,请使用 如何使用 'json' 模块一次读入一个 JSON 对象? 解析单个对象使用缓冲方法的对象.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.

这篇关于加载和解析具有多个 JSON 对象的 JSON 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆