如何自动修复无效的JSON字符串? [英] How do I automatically fix an invalid JSON string?

查看:323
本文介绍了如何自动修复无效的JSON字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  {
api_version:1.3 ,
response_code:200,
id:3237490513229753,
lon:38.969916127827,
lat:45.069889625267 b $ bpage_url:null,
name:ATB,
firm_group:{
id:3237499103085728,
count 1
},
city_name:Krasnodar,
city_id:3237585002430511,
address:Turgeneva,172/1,
create_time:2008-07-22 10:02:04 07,
modification_time:2013-08-09 20:04:36 07,
see_also :[
{
id:3237491513434577,
lon:38.973110606808,
lat:45.029031222211,
name:Advance ,
hash:5698hn745A8IJ1H86177uvgn94521J3464he26763737242Cf6e654G62J0I7878e,
ads:{
sponsored_article:{
title:CenterADVANCE,
text:Business.English。
},
警告:null
}
}
]
}

但Python无法识别:

  json.loads预期,分隔符:行1列3646(char 3645); 




<


它似乎是一个引号的问题:
title:CenterADVANCE



如何在Python中自动修复?

解决方案

给了我一个想法...不是一个非常漂亮的想法,但它似乎工作,至少在你的例子:尝试解析JSON字符串,如果失败,寻找失败的字符在异常字符串和替换那个角色。

  while True:
try:
result = json.loads(s)#try解析...
break#解析工作 - >退出循环
除了异常作为e:
#期望,分隔符:第34行第54列(char 1158)
#''后的意外字符的位置'
unexp = int (re.findall(r'\(char(\d +)\)',str(e))[0])
#在
unesc = s之前未转义的' .rfind(r'',0,unexp)
s = s [:unesc] + r'\'+ s [unesc + 1:]
# +2插入'\')
closg = s.find(r'',unesc + 2)
s = s [:closg] + r'\'+ s [closg + 1:]
打印结果

您可能需要添加一些额外的检查来阻止结束于无限循环(例如,最多与字符串中有字符重复的重复)此外,如果不正确的是实际上跟着一个逗号,如@gnibbler所指出的。



更新:这似乎很好,现在(尽管还不完美),即使未转义的之后是逗号或关闭括号,因为在这种情况下,它可能会收到关于语法错误(预期的属性名称等)的投诉,并追溯到最后的。它还会自动转义相应的关闭(假设有一个)。


From the 2gis API I got the following JSON string.

{
    "api_version": "1.3",
    "response_code": "200",
    "id": "3237490513229753",
    "lon": "38.969916127827",
    "lat": "45.069889625267",
    "page_url": null,
    "name": "ATB",
    "firm_group": {
        "id": "3237499103085728",
        "count": "1"
    },
    "city_name": "Krasnodar",
    "city_id": "3237585002430511",
    "address": "Turgeneva,   172/1",
    "create_time": "2008-07-22 10:02:04 07",
    "modification_time": "2013-08-09 20:04:36 07",
    "see_also": [
        {
            "id": "3237491513434577",
            "lon": 38.973110606808,
            "lat": 45.029031222211,
            "name": "Advance",
            "hash": "5698hn745A8IJ1H86177uvgn94521J3464he26763737242Cf6e654G62J0I7878e",
            "ads": {
                "sponsored_article": {
                    "title": "Center "ADVANCE"",
                    "text": "Business.English."
                },
                "warning": null
            }
        }
    ]
}

But Python doesn't recognize it:

json.loads(firm_str)

Expecting , delimiter: line 1 column 3646 (char 3645)

It looks like a problem with quotes in: "title": "Center "ADVANCE""

How can I fix it automatically in Python?

解决方案

The answer by @Michael gave me an idea... not a very pretty idea, but it seems to work, at least on your example: Try to parse the JSON string, and if it fails, look for the character where it failed in the exception string and replace that character.

while True:
    try:
        result = json.loads(s)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'\"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '\')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'\"' + s[closg+1:]
print result

You may want to add some additional checks to prevent this from ending in an infinite loop (e.g., at max as many repetitions as there are characters in the string). Also, this will still not work if an incorrect " is actually followed by a comma, as pointed out by @gnibbler.

Update: This seems to work pretty well now (though still not perfect), even if the unescaped " is followed by a comma, or closing bracket, as in this case it will likely get a complaint about a syntax error after that (expected property name, etc.) and trace back to the last ". It also automatically escapes the corresponding closing " (assuming there is one).

这篇关于如何自动修复无效的JSON字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆