如何使用Python避免在JSON中的HTML标记中关闭'/'? [英] How do I escape closing '/' in HTML tags in JSON with Python?
问题描述
注意:这个问题与在脚本标签中嵌入JSON对象非常相似,但对该问题的回答提供了我已经知道的信息(在JSON /
== \ /
中) 。我想知道如何进行转义。
Note: This question is very close to Embedding JSON objects in script tags, but the responses to that question provides what I already know (that in JSON /
== \/
). I want to know how to do that escaping.
HTML规范禁止在< script>
元素。因此,这会导致解析错误:
The HTML spec prohibits closed HTML tags anywhere within a <script>
element. So, this causes parse errors:
<script>
var assets = [{
"asset_created": null,
"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f",
"body": "<script></script>"
}];
</script>
在我的情况下,我通过在Django模板中渲染JSON字符串来生成无效情况,即:
In my case, I'm generating the invalid situation by rendering a JSON string inside a Django template, i.e.:
<script>
var assets = {{ json_string }};
</script>
我知道JSON解析 \ /
与 /
相同,因此,如果我可以在JSON字符串中转义结束的HTML标记,那将会很好。但是,我不确定执行此操作的最佳方法。
I know that JSON parses \/
the same as /
, so if I can just escape my closing HTML tags in the JSON string, I'll be good. But, I'm not sure of the best way to do this.
我天真的做法就是这样:
My naive approach would just be this:
json_string = '[{"asset_created": null, "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "<script></script>"}]'
escaped_json_string = json_string.replace('</', r'<\/')
有更好的方法吗?还是我忽略的任何陷阱?
Is there a better way? Or any gotchas that I'm overlooking?
推荐答案
更新后的答案
好吧,我错误地假设了一些事情。为了转义JSON, simplejson 库具有方法 JSONEncoderForHTML 可以使用。如果代码不起作用,则可能需要通过 pip
或 easy_install
进行安装。然后您可以执行以下操作:
Okay I assumed a few things incorrectly. For escaping the JSON, the simplejson library has a method JSONEncoderForHTML than can be used. You may need to install it via pip
or easy_install
if the code doesn't work. Then you can do something like this:
import simplejson
asset_json=simplejson.loads(json_string)
encoded=simplejson.encoder.JSONEncoderForHTML().encode(assets_json)
其中编码
将为您提供:
'{"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "\\u003cscript\\u003e\\u003c/script\\u003e", "asset_created": null}'
这是比斜杠替换更全面的解决方案,因为它也可以处理其他编码警告。
This is a more overall solution than the slash replace as it handles other encoding caveats as well.
loads
部分是已经对JSON进行编码的副作用。可以通过不使用DJango(如果可能的话)来生成JSON来避免,而可以使用simplejson:
The loads
part is a side-effect of having the JSON already encoded. This can be avoided by not using DJango if possible to generate the JSON and instead using simplejson:
simplejson.dumps(your_object_to_encode, cls=simplejson.encoder.JSONEncoderForHTML)
旧答案
尝试将脚本包装在 CDATA :
<script>
//<![CDATA[
var assets = [{
"asset_created": null,
"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f",
"body": "<script></script>"
}];
//]]>
</script>
这是为了在此类情况下标记解析器。否则,您将需要使用前面提到的字符转义符。
It's meant to flag the parser on this sort of thing. Otherwise you'll need to use the character escapes that have been mentioned.
这篇关于如何使用Python避免在JSON中的HTML标记中关闭'/'?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!