JSON错误-键未加引号 [英] Bad JSON - Keys are not quoted

查看:124
本文介绍了JSON错误-键未加引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从AWS(从javascript文件)中抓取一些JSONP字典.在仅解析了类似于JSON的数据的原始数据之后,在某些情况下,我获得了有效的JSON,并且可以在Python(json_data = json.loads(json_like_data))中成功加载该数据.但是,某些Amazon的JSONP不包含其密钥周围的引号(请参阅下文).

...
{type:"storageCurrentGen",sizes:
[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},
{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},
{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},
...

对于JSONP,这仍然有效,因为它是有效的JavaScript语法.但是,Python的json.loads(json_str)无效,因为它不是有效的 JSON .

还有另一个Python模块YAML,可以处理未加引号的键,但是分号(:)后必须有一个空格.

我认为我有两个选择.

  1. 以某种方式替换大括号或逗号({ | ,)和冒号(:)之间的字符.然后使用json.loads(...).
  2. 在冒号(:)之后添加一个空格.然后用yaml.load(...)解析.

我的猜测是选项2优于1.但是,我正在寻求更好解决方案的建议.

以前有没有人遇到过格式错误的JSON,并使用Python对其进行解析?

解决方案

您有一个 HJSON文档,网址为您可以使用 hjson项目进行解析:

>>> import hjson
>>> hjson.loads('{javascript_style:"Look ma, no quotes!"}')
OrderedDict([('javascript_style', 'Look ma, no quotes!')])

HJSON是JSON,不需要引用对象名称甚至某些字符串值,还添加了注释支持和多行字符串,并且在应使用逗号的地方放宽了规则(包括根本不使用逗号). /p>

或者您可以安装并使用 demjson;它支持解析有效的JavaScript(缺少引号):

import demjson

result = demjson.decode(jsonp_payload)

仅当设置strict=True标志时,demjson才会拒绝解析您的输入:

>>> import demjson
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}')
{u'javascript_style': u'Look ma, no quotes!'}
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}', strict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 5701, in decode
    return_stats=(return_stats or write_stats) )
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 4917, in decode
    raise errors[0]
demjson.JSONDecodeError: ('JSON does not allow identifiers to be used as strings', u'javascript_style')

使用正则表达式,您可以尝试使用正则表达式来对有效的JSON进行正则表达式;但是,这会导致误报.该模式为:

import re

valid_json = re.sub(r'(?<={|,)([a-zA-Z][a-zA-Z0-9]*)(?=:)', r'"\1"', jsonp_payload)

这与{,匹配,后跟JavaScript标识符(一个字符,后跟更多字符或数字),然后直接跟一个:冒号.如果您引用的值包含任何此类模式,则将获得无效的JSON.

I am scraping some JSONP dictionaries from AWS (from javascript files). After parsing the raw data for only the JSON-like data, in some cases I get a valid JSON and can successfully load this in Python (json_data = json.loads(json_like_data)). However, some of Amazon's JSONPs do not include quotes around their keys (see the following).

...
{type:"storageCurrentGen",sizes:
[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},
{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},
{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},
...

For JSONP, this still works as it is valid JavaScript syntax. However, Python's json.loads(json_str) craps out as it is not valid JSON.

There is another Python module YAML which can handle unquoted keys, BUT there must be a space after the semicolons (:).

I figure that I have two options.

  1. Somehow replace character in between an open brace or comma ({ | ,) and a colon (:). Then use json.loads(...).
  2. Add a space after ever colon (:). Then parse with yaml.load(...).

My guess is that option 2 is better than 1. However, I am seeking suggestion of a better solution.

Has anyone encountered an ill-formatted JSON such as this before and used Python to parse it?

解决方案

You have have an HJSON document, at which point you can use the hjson project to parse it:

>>> import hjson
>>> hjson.loads('{javascript_style:"Look ma, no quotes!"}')
OrderedDict([('javascript_style', 'Look ma, no quotes!')])

HJSON is JSON without the requirement to quote object names and even for certain string values, with added comment support and multi-line strings, and with relaxed rules on where commas should be used (including not using commas at all).

Or you could install and use the demjson library; it supports parsing valid JavaScript (missing quotes):

import demjson

result = demjson.decode(jsonp_payload)

Only when you set the strict=True flag does demjson refuse to parse your input:

>>> import demjson
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}')
{u'javascript_style': u'Look ma, no quotes!'}
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}', strict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 5701, in decode
    return_stats=(return_stats or write_stats) )
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 4917, in decode
    raise errors[0]
demjson.JSONDecodeError: ('JSON does not allow identifiers to be used as strings', u'javascript_style')

Using a regular expression you can try to regex your way to valid JSON; this can lead to false positives however. The pattern would be:

import re

valid_json = re.sub(r'(?<={|,)([a-zA-Z][a-zA-Z0-9]*)(?=:)', r'"\1"', jsonp_payload)

This matches a { or ,, followed by a JavaScript identifier (a character, followed by more characters or digits), and followed directly by a : colon. If your quoted values contain any such patterns, you'll get invalid JSON.

这篇关于JSON错误-键未加引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆