无法使用python解析简单的json [英] Can't parse simple json with python

查看:142
本文介绍了无法使用python解析简单的json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的json,无法使用simplejson模块进行解析. 复制:

I have a very simple json I can't parse with simplejson module. Reproduction:

import simplejson as json
json.loads(r'{"translatedatt1":"Vari\351es"}')

结果:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.5/simplejson/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 351, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 23 (char 23)

任何人都知道出了什么问题以及如何正确解析上面的json?

Anyone has an idea what's wrong and how to parse the json above correctly?

其中编码的字符串为:变量

The string that is encoded there is: Variées

P.S.我使用python 2.5

P.S. I use python 2.5

非常感谢!

推荐答案

那是非常正确的; Vari\351es包含无效的转义符,JSON标准不允许在\后面加上数字.

That would be quite correct; Vari\351es contains an invalid escape, the JSON standard does not allow for a \ followed by just numbers.

无论产生什么代码,都应该修复.如果不可能,则需要使用正则表达式删除这些转义符,或将其替换为有效的转义符.

Whatever produced that code should be fixed. If that is impossible, you'll need to use a regular expression to either remove those escapes, or replace them with valid escapes.

如果我们将351数字解释为八进制数字,则它将指向unicode代码点U + 00E9,即é字符(带小写字母的拉丁文小写字母E).您可以使用以下方法修复" JSON输入:

If we interpret the 351 number as an octal number, that would point to the unicode code point U+00E9, the é character (LATIN SMALL LETTER E WITH ACUTE). You can 'repair' your JSON input with:

import re

invalid_escape = re.compile(r'\\[0-7]{1,6}')  # up to 6 digits for codepoints up to FFFF

def replace_with_codepoint(match):
    return unichr(int(match.group(0)[1:], 8))


def repair(brokenjson):
    return invalid_escape.sub(replace_with_codepoint, brokenjson)

使用repair()可以加载您的示例:

Using repair() your example can be loaded:

>>> json.loads(repair(r'{"translatedatt1":"Vari\351es"}'))
{u'translatedatt1': u'Vari\xe9es'}

您可能需要调整代码点的解释;我选择八进制(因为Variées是一个实际的单词),但是您需要使用其他代码点对其进行更多测试.

You may need to adjust the interpretation of the codepoints; I choose octal (because Variées is an actual word), but you need to test this more with other codepoints.

这篇关于无法使用python解析简单的json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆