将HTML实体转换为Unicode,反之亦然 [英] Convert HTML entities to Unicode and vice versa

查看:301
本文介绍了将HTML实体转换为Unicode,反之亦然的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能存在重复:


  • 在Python中将XML / HTML实体转换为Unicode字符串

  • HTML实体代码到文本

  • 如何在Python中将HTML实体转换为Unicode,反之亦然?

    >解决方案

    您需要 BeautifulSoup

      from BeautifulSoup import BeautifulStoneSoup 
    import cgi

    def HTMLEntitiesToUnicode(text):
    将HTML实体转换为unicode,例如'&'变成'&'。
    text = unicode(BeautifulStoneSoup(text,convertEntities = BeautifulStoneSoup.ALL_ENTITIES) )
    返回文本

    def unicodeToHTMLEntities(文本):
    将unicode转换为HTML实体。例如'&'变成'&'。
    text = cgi.escape(text).encode('ascii','xmlcharrefreplace')
    返回文本

    text =& amp;;& reg;;& lt ;,& gt ;,& cent ;,;& pound;& yen ;;& euro ;,& sect ;& copy;

    uni = HTMLEntitiesToUnicode(text)
    htmlent = unicodeToHTMLEntities(uni)

    print uni
    print htmlent
    &,<,>,¢,£,¥,€,§,©
    #& amp;#174;& lt;& gt ;, &#162;&#163;&#165;&#8364 ;,&#167;&#169;


    Possible duplicates:

    How do you convert HTML entities to Unicode and vice versa in Python?

    解决方案

    You need to have BeautifulSoup.

    from BeautifulSoup import BeautifulStoneSoup
    import cgi
    
    def HTMLEntitiesToUnicode(text):
        """Converts HTML entities to unicode.  For example '&amp;' becomes '&'."""
        text = unicode(BeautifulStoneSoup(text, convertEntities=BeautifulStoneSoup.ALL_ENTITIES))
        return text
    
    def unicodeToHTMLEntities(text):
        """Converts unicode to HTML entities.  For example '&' becomes '&amp;'."""
        text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace')
        return text
    
    text = "&amp;, &reg;, &lt;, &gt;, &cent;, &pound;, &yen;, &euro;, &sect;, &copy;"
    
    uni = HTMLEntitiesToUnicode(text)
    htmlent = unicodeToHTMLEntities(uni)
    
    print uni
    print htmlent
    # &, ®, <, >, ¢, £, ¥, €, §, ©
    # &amp;, &#174;, &lt;, &gt;, &#162;, &#163;, &#165;, &#8364;, &#167;, &#169;
    

    这篇关于将HTML实体转换为Unicode,反之亦然的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆