在Python中转义HTML最简单的方法是什么? [英] What's the easiest way to escape HTML in Python?
问题描述
cgi.escape
很好。它逃脱:-
<
to& lt ;
-
>
to& gt;
-
&
至& amp;
这对于所有的HTML都是足够的。
编辑:如果你有非ascii字符您还想要转义,以便包含在另一个使用不同编码的编码文档中,如 Craig 所述,只需使用:
data.encode('ascii','xmlcharrefreplace')
不要忘记先解码 data
到 unicode
,然后使用它编码的任何编码。
然而,根据我的经验,如果您从开始的所有时间开始使用 unicode
,那么这种编码就毫无用处。最后编码为文档头部指定的编码( utf-8
以获得最大兼容性)。
例如:
>>> ('ascii','xmlcharrefreplace')
'& lt; a& gt; b& lt; / a>< a& gt; b& / A和GT;
另外值得一提的是(感谢Greg)是额外的 cgi.escape
需要。将它设置为 True ,
cgi.escape
也会转义双引号字符( code>),所以你可以在XML / HTML属性中使用结果值。
html.escape
,除了 quote
之外,它的默认设置为True。 cgi.escape seems like one possible choice. Does it work well? Is there something that is considered better?
cgi.escape
is fine. It escapes:
<
to<
>
to>
&
to&
That is enough for all HTML.
EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:
data.encode('ascii', 'xmlcharrefreplace')
Don't forget to decode data
to unicode
first, using whatever encoding it was encoded.
However in my experience that kind of encoding is useless if you just work with unicode
all the time from start. Just encode at the end to the encoding specified in the document header (utf-8
for maximum compatibility).
Example:
>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'<a>bá</a>
Also worth of note (thanks Greg) is the extra quote
parameter cgi.escape
takes. With it set to True
, cgi.escape
also escapes double quote chars ("
) so you can use the resulting value in a XML/HTML attribute.
EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape
, which does the same except that quote
defaults to True.
这篇关于在Python中转义HTML最简单的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!