处理JSON中错误转义的字符 [英] Dealing with mis-escaped characters in JSON

查看：1906 发布时间：2017/11/4 22:39:29 python json string file-io data-structures

本文介绍了处理JSON中错误转义的字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在读取一个包含转义的单引号（ \'）的Python的JSON文件。这会导致各种各样的打嗝，正如讨论过的那样。此处。不过，我无法找到任何有关解决问题的方法。我只是做了一个

  newstring = originalstring.replace（r\'，'）

以及事情的结果。但是这似乎相当难看。在json

对于这样的问题，有一个很好的，干净的程序吗？
li>

不幸的是，回到源代码是不可能的。

感谢您的帮助！ JSON标准定义了一组特定的有效的2字符转义序列： \\ ， \ / ， \， \b ， \r ，\\\， \ f 和 \ t 和一个4字符的转义序列来定义任何Unicode代码点， \uhhhh （ \u 加上4个十六进制数字）。其他任何反斜杠序列加上其他字符无效的JSON 。

如果您有JSON源，则无法解决，否则唯一的办法就是删除无效序列用 str.replace（）做了，即使它有点脆弱（当引号之前有一个反斜杠序列的时候，它会中断的）。

你可以使用常规的e也可以使用 sub（r'（？<！\\）\\（？！[\\ / bfnrt] | u [0-9a-fA-F] {4}）'，r ，输入字符串）

这不会发现奇数反斜杠序列，如 \\\\ 但是会抓住其他的东西：
>>> import re，json >>> broken = r'带有转义引号的JSON字符串：\'和其他各种转义符：\ a \& \ $和一个换行符\\\ ' >>> json.loads（已损坏） Traceback（最近一次调用的最后一个）：在< module>文件中的< stdin> 文件/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.5/json/__init__.py，第319行，载入 return _default_decoder.decode（ s）文件/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.5/json/decoder.py，第339行解码 obj，end = self.raw_decode（s，idx = _w（s，0）.end（））文件/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.5/json /decoder.py，第355行，在raw_decode obj，end = self.scan_once（s，idx） json.decoder.JSONDecodeError：无效\escape：第1行第34列（char 33） >>> json.loads（应用re.sub（R'（小于\\）\\（[\\ / bfnrt] | U [0-9A-FA-F] {4}？！？！）'，r''，broken））带有转义引号的JSON字符串和其他各种破解转义：a& $和一个换行符\\\

I am reading a JSON file into Python which contains escaped single quotes (\'). This leads to all kinds of hiccups, as nicely discussed e.g. here. However, I could not find anything on how to address the issue. I just did a
newstring=originalstring.replace(r"\'", "'")
and things worked out. But this seems rather ugly. I could not really find much material on how to deal with this kind of thing (creating an exception, or something) in the json docs either.

Is there a good, clean procedure for such an issue?

Going back to the source is not possible, unfortunately.

Thanks for your help!
解决方案
The JSON standard defines specific set of valid 2-character escape sequences: \\, \/, \", \b, \r, \n, \f and \t, and one 4-character escape sequence to define any Unicode codepoint, \uhhhh (\u plus 4 hex digits). Any other sequence of backslash plus other character is invalid JSON.

If you have a JSON source you can't fix otherwise, the only way out is to remove the invalid sequences, like you did with str.replace() even if it is a little fragile (it'll break when there is an even backslash sequence preceding the quote).

You could use a regular expression too, where you remove any backslashes not used in a valid sequence:
fixed = re.sub(r'(?<!\\)\$?!["\\/bfnrt]|u[0-9a-fA-F]{4})', r'', inputstring)
This won't catch out an odd-count backslash sequence like \\\ but will catch anything else:
>>> import re, json >>> broken = r'"JSON string with escaped quote: \' and various other broken escapes: \a \& \$ and a newline!\n"' >>> json.loads(broken) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.5/json/__init__.py", line 319, in loads return _default_decoder.decode(s) File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.5/json/decoder.py", line 355, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Invalid \escape: line 1 column 34 (char 33) >>> json.loads(re.sub(r'(?<!\$\\(?!["\\/bfnrt]|u[0-9a-fA-F]{4})', r'', broken)) "JSON string with escaped quote: ' and various other broken escapes: a & $ and a newline!\n"

这篇关于处理JSON中错误转义的字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

处理JSON中错误转义的字符 [英] Dealing with mis-escaped characters in JSON

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

处理JSON中错误转义的字符 [英] Dealing with mis-escaped characters in JSON

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭