将HTML实体转换为Unicode,反之亦然 [英] Convert HTML entities to Unicode and vice versa
本文介绍了将HTML实体转换为Unicode,反之亦然的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
可能存在重复:
如何在Python中将HTML实体转换为Unicode,反之亦然?
>解决方案您需要 BeautifulSoup 。
from BeautifulSoup import BeautifulStoneSoup
import cgi
def HTMLEntitiesToUnicode(text):
将HTML实体转换为unicode,例如'&'变成'&'。
text = unicode(BeautifulStoneSoup(text,convertEntities = BeautifulStoneSoup.ALL_ENTITIES) )
返回文本
def unicodeToHTMLEntities(文本):
将unicode转换为HTML实体。例如'&'变成'&'。
text = cgi.escape(text).encode('ascii','xmlcharrefreplace')
返回文本
text =& amp;;& reg;;& lt ;,& gt ;,& cent ;,;& pound;& yen ;;& euro ;,& sect ;& copy;
uni = HTMLEntitiesToUnicode(text)
htmlent = unicodeToHTMLEntities(uni)
print uni
print htmlent
&,<,>,¢,£,¥,€,§,©
#& amp;#174;& lt;& gt ;, &#162;&#163;&#165;&#8364 ;,&#167;&#169;
Possible duplicates:
How do you convert HTML entities to Unicode and vice versa in Python?
解决方案You need to have BeautifulSoup.
from BeautifulSoup import BeautifulStoneSoup import cgi def HTMLEntitiesToUnicode(text): """Converts HTML entities to unicode. For example '&' becomes '&'.""" text = unicode(BeautifulStoneSoup(text, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)) return text def unicodeToHTMLEntities(text): """Converts unicode to HTML entities. For example '&' becomes '&'.""" text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace') return text text = "&, ®, <, >, ¢, £, ¥, €, §, ©" uni = HTMLEntitiesToUnicode(text) htmlent = unicodeToHTMLEntities(uni) print uni print htmlent # &, ®, <, >, ¢, £, ¥, €, §, © # &, ®, <, >, ¢, £, ¥, €, §, ©
这篇关于将HTML实体转换为Unicode,反之亦然的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文