无法使用python解析简单的json [英] Can't parse simple json with python
问题描述
我有一个非常简单的json,无法使用simplejson模块进行解析. 复制:
I have a very simple json I can't parse with simplejson module. Reproduction:
import simplejson as json
json.loads(r'{"translatedatt1":"Vari\351es"}')
结果:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.5/simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 351, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 23 (char 23)
任何人都知道出了什么问题以及如何正确解析上面的json?
Anyone has an idea what's wrong and how to parse the json above correctly?
其中编码的字符串为:变量
The string that is encoded there is: Variées
P.S.我使用python 2.5
P.S. I use python 2.5
非常感谢!
推荐答案
那是非常正确的; Vari\351es
包含无效的转义符,JSON标准不允许在\
后面加上数字.
That would be quite correct; Vari\351es
contains an invalid escape, the JSON standard does not allow for a \
followed by just numbers.
无论产生什么代码,都应该修复.如果不可能,则需要使用正则表达式删除这些转义符,或将其替换为有效的转义符.
Whatever produced that code should be fixed. If that is impossible, you'll need to use a regular expression to either remove those escapes, or replace them with valid escapes.
如果我们将351
数字解释为八进制数字,则它将指向unicode代码点U + 00E9,即é
字符(带小写字母的拉丁文小写字母E).您可以使用以下方法修复" JSON输入:
If we interpret the 351
number as an octal number, that would point to the unicode code point U+00E9, the é
character (LATIN SMALL LETTER E WITH ACUTE). You can 'repair' your JSON input with:
import re
invalid_escape = re.compile(r'\\[0-7]{1,6}') # up to 6 digits for codepoints up to FFFF
def replace_with_codepoint(match):
return unichr(int(match.group(0)[1:], 8))
def repair(brokenjson):
return invalid_escape.sub(replace_with_codepoint, brokenjson)
使用repair()
可以加载您的示例:
Using repair()
your example can be loaded:
>>> json.loads(repair(r'{"translatedatt1":"Vari\351es"}'))
{u'translatedatt1': u'Vari\xe9es'}
您可能需要调整代码点的解释;我选择八进制(因为Variées
是一个实际的单词),但是您需要使用其他代码点对其进行更多测试.
You may need to adjust the interpretation of the codepoints; I choose octal (because Variées
is an actual word), but you need to test this more with other codepoints.
这篇关于无法使用python解析简单的json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!