JSON.loads()ValueError Python中的额外数据 [英] JSON.loads() ValueError Extra Data in Python

查看:124
本文介绍了JSON.loads()ValueError Python中的额外数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从JSON提要中读取单个值.这是提要数据的示例:

I'm trying to read individual values from a JSON feed. Here is an example of the feed data:

{
    "sendtoken": "token1",
    "bytes_transferred": 0,
    "num_retries": 0,
    "timestamp": 1414395374,
    "queue_time": 975,
    "message": "internalerror",
    "id": "mailerX",
    "m0": {
        "binding_group": "domain.com",
        "recipient_domain": "hotmail.com",
        "recipient_local": "destination",
        "sender_domain": "domain.com",
        "binding": "mail.domain.com",
        "message_id": "C1/34-54876-D36FA645",
        "api_credential": "creds",
        "sender_local": "localstring"
    },
    "rejecting_ip": "145.5.5.5",
    "type": "alpha",
    "message_stage": 3
}
{
    "sendtoken": "token2",
    "bytes_transferred": 0,
    "num_retries": 0,
    "timestamp": 1414397568,
    "queue_time": 538,
    "message": "internal error,
    "id": "mailerX",
    "m0": {
        "binding_group": "domain.com",
        "recipient_domain": "hotmail.com",
        "recipient_local": "destination",
        "sender_domain": "domain.com",
        "binding": "mail.domain.com",
        "message_id": "C1/34-54876-D36FA645",
        "api_credential": "creds",
        "sender_local": "localstring"
    },
    "rejecting_ip": "145.5.5.5",
    "type": "alpha",
    "message_stage": 3
}

我无法分享实际的URL,但是上面是我跑步时显示的大约150个结果中的前2个

I can't share the actual URL, but the above are the first 2 of roughly 150 results that are displayed if I run

print results

之前

json.loads()

行.

我的代码:

import urllib2
import json

results = urllib2.urlopen(url).read()
jsondata = json.loads(results)

for row in jsondata:
     print row['sendtoken']
     print row['recipient_domain']

我想要类似的输出

token1
hotmail.com

每个条目.

我收到此错误:

ValueError: Extra data: line 2 column 1 - line 133 column 1 (char 583 - 77680)

我距离Python专家还很远,这是我第一次使用JSON.我花了很多时间在google和Stack Overflow上,但是找不到适合我的特定数据格式的解决方案.

I'm far from a Python expert, and this is my first time working with JSON. I've spent quite a bit of time looking on google and Stack Overflow, but I can't find a solution that works for my specific data format.

推荐答案

问题是您的数据没有形成JSON对象,因此您无法使用json.loads对其进行解码.

The problem is that your data don't form a JSON object, so you can't decode them with json.loads.

首先,这个出现是由空格分隔的JSON对象序列.由于您不会告诉我们任何有关数据来自何处的信息,因此这实际上只是一个有根据的猜测;希望任何文档或同事或告诉您有关此URL的信息告诉您实际的格式是什么.但是,让我们假设我的有根据的猜测是正确的.

First, this appears to be a sequence of JSON objects separated by spaces. Since you won't tell us anything about where the data come from, this is really just an educated guess; hopefully whatever documentation or coworker or whatever told you about this URL told you what the format actually is. But let's assume that my educated guess is correct.

在Python中解析JSON对象流的最简单方法是使用raw_decode方法.像这样的东西:*

The easiest way to parse a stream of JSON objects in Python is to use the raw_decode method. Something like this:*

import json

def parse_json_stream(stream):
    decoder = json.JSONDecoder()
    while stream:
        obj, idx = decoder.raw_decode(stream)
        yield obj
        stream = stream[idx:].lstrip()


但是,流中的第二个JSON对象中也存在错误.看这部分:


However, there's also an error in the second JSON object in the stream. Look at this part:

…
"message": "internal error,
"id": "mailerX",
…

"internal error之后缺少".如果您解决了该问题,则上面的函数将迭代两个JSON对象.

There's a missing " after "internal error. If you fix that, then the function above will iterate two JSON objects.

希望该错误是由您尝试通过重写手动复制并粘贴"数据引起的.如果它在原始源数据中,那么您会遇到更大的问题;您可能需要从头开始编写破碎的JSON"解析器,以启发式地猜测数据的意图.或者,当然,请让生成源的任何人正确地生成源.

Hopefully that error was caused by you trying to manually "copy and paste" data by rewriting it. If it's in your original source data, you've got a much bigger problem; you probably need to write a "broken JSON" parser from scratch that can heuristically guess at what the data were intended to be. Or, of course, get whoever's generating the source to generate it properly.

*通常,使用raw_decode的第二个参数传递起始索引要比每次切掉其余部分的副本效率更高.但是raw_decode无法处理前导空格.切片和剥离要比编写从给定索引跳过空白的代码要容易一些,但是如果这些副本的内存和性能成本很重要,则应编写更复杂的代码.

* In general, it's more efficient to use the second argument to raw_decode to pass a start index, instead of slicing off a copy of the remainder each time. But raw_decode can't handle leading whitespace. It's a little easier to just slice and strip than to write code that skips over whitespace from the given index, but if the memory and performance costs of those copies matter, you should write the more complicated code.

这篇关于JSON.loads()ValueError Python中的额外数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆