JSON错误-键未加引号 [英] Bad JSON - Keys are not quoted
问题描述
我正在从AWS(从javascript文件)中抓取一些JSONP字典.在仅解析了类似于JSON的数据的原始数据之后,在某些情况下,我获得了有效的JSON,并且可以在Python(json_data = json.loads(json_like_data)
)中成功加载该数据.但是,某些Amazon的JSONP不包含其密钥周围的引号(请参阅下文).
...
{type:"storageCurrentGen",sizes:
[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},
{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},
{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},
...
对于JSONP,这仍然有效,因为它是有效的JavaScript语法.但是,Python的json.loads(json_str)
无效,因为它不是有效的 JSON .
还有另一个Python模块YAML,可以处理未加引号的键,但是分号(:
)后必须有一个空格.
我认为我有两个选择.
- 以某种方式替换大括号或逗号(
{
|,
)和冒号(:
)之间的字符.然后使用json.loads(...)
. - 在冒号(
:
)之后添加一个空格.然后用yaml.load(...)
解析.
我的猜测是选项2优于1.但是,我正在寻求更好解决方案的建议.
以前有没有人遇到过格式错误的JSON,并使用Python对其进行解析?
您有一个 HJSON文档,网址为您可以使用 hjson
项目进行解析:
>>> import hjson
>>> hjson.loads('{javascript_style:"Look ma, no quotes!"}')
OrderedDict([('javascript_style', 'Look ma, no quotes!')])
HJSON是JSON,不需要引用对象名称甚至某些字符串值,还添加了注释支持和多行字符串,并且在应使用逗号的地方放宽了规则(包括根本不使用逗号). /p>
或者您可以安装并使用 demjson
库;它支持解析有效的JavaScript(缺少引号):
import demjson
result = demjson.decode(jsonp_payload)
仅当设置strict=True
标志时,demjson
才会拒绝解析您的输入:
>>> import demjson
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}')
{u'javascript_style': u'Look ma, no quotes!'}
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}', strict=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 5701, in decode
return_stats=(return_stats or write_stats) )
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 4917, in decode
raise errors[0]
demjson.JSONDecodeError: ('JSON does not allow identifiers to be used as strings', u'javascript_style')
使用正则表达式,您可以尝试使用正则表达式来对有效的JSON进行正则表达式;但是,这会导致误报.该模式为:
import re
valid_json = re.sub(r'(?<={|,)([a-zA-Z][a-zA-Z0-9]*)(?=:)', r'"\1"', jsonp_payload)
这与{
或,
匹配,后跟JavaScript标识符(一个字符,后跟更多字符或数字),然后直接跟一个:
冒号.如果您引用的值包含任何此类模式,则将获得无效的JSON.
I am scraping some JSONP dictionaries from AWS (from javascript files). After parsing the raw data for only the JSON-like data, in some cases I get a valid JSON and can successfully load this in Python (json_data = json.loads(json_like_data)
). However, some of Amazon's JSONPs do not include quotes around their keys (see the following).
...
{type:"storageCurrentGen",sizes:
[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},
{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},
{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},
...
For JSONP, this still works as it is valid JavaScript syntax. However, Python's json.loads(json_str)
craps out as it is not valid JSON.
There is another Python module YAML which can handle unquoted keys, BUT there must be a space after the semicolons (:
).
I figure that I have two options.
- Somehow replace character in between an open brace or comma (
{
|,
) and a colon (:
). Then usejson.loads(...)
. - Add a space after ever colon (
:
). Then parse withyaml.load(...)
.
My guess is that option 2 is better than 1. However, I am seeking suggestion of a better solution.
Has anyone encountered an ill-formatted JSON such as this before and used Python to parse it?
You have have an HJSON document, at which point you can use the hjson
project to parse it:
>>> import hjson
>>> hjson.loads('{javascript_style:"Look ma, no quotes!"}')
OrderedDict([('javascript_style', 'Look ma, no quotes!')])
HJSON is JSON without the requirement to quote object names and even for certain string values, with added comment support and multi-line strings, and with relaxed rules on where commas should be used (including not using commas at all).
Or you could install and use the demjson
library; it supports parsing valid JavaScript (missing quotes):
import demjson
result = demjson.decode(jsonp_payload)
Only when you set the strict=True
flag does demjson
refuse to parse your input:
>>> import demjson
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}')
{u'javascript_style': u'Look ma, no quotes!'}
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}', strict=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 5701, in decode
return_stats=(return_stats or write_stats) )
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 4917, in decode
raise errors[0]
demjson.JSONDecodeError: ('JSON does not allow identifiers to be used as strings', u'javascript_style')
Using a regular expression you can try to regex your way to valid JSON; this can lead to false positives however. The pattern would be:
import re
valid_json = re.sub(r'(?<={|,)([a-zA-Z][a-zA-Z0-9]*)(?=:)', r'"\1"', jsonp_payload)
This matches a {
or ,
, followed by a JavaScript identifier (a character, followed by more characters or digits), and followed directly by a :
colon. If your quoted values contain any such patterns, you'll get invalid JSON.
这篇关于JSON错误-键未加引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!