HTMLParser.HTMLParser().unescape()不起作用 [英] HTMLParser.HTMLParser().unescape() doesn't work
问题描述
我想将HTML实体转换回其可读格式,例如'& pound;'
转换为'£','& deg;'
转换为'°'等.
I would like to convert HTML entities back to its human readable format, e.g. '£'
to '£', '°'
to '°' etc.
我已经阅读了有关此问题的几篇文章
I've read several posts regarding this question
将XML/HTML实体转换为Python中的Unicode字符串
根据他们的说法,我选择使用未记录的函数unescape(),但它对我不起作用...
and according to them, I chose to use the undocumented function unescape(), but it doesn't work for me...
我的代码示例如下:
import HTMLParser
htmlParser = HTMLParser.HTMLParser()
decoded = htmlParser.unescape('© 2013')
print decoded
当我运行此python脚本时,输出仍然是:
When I ran this python script, the output is still:
© 2013
代替
© 2013
我正在使用Python 2.X,可在Windows 7和Cygwin控制台上使用.我用Google搜索,没有发现任何类似的问题.有人可以帮助我吗?
I'm using Python 2.X, working on Windows 7 and Cygwin console. I googled and didn't find any similar problems..Could anyone help me with this?
推荐答案
显然 HTMLParser.unescape
是 Python 2.6 .
Python 2.5:
Python 2.5:
>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape('©')
'©'
Python 2.6/2.7:
Python 2.6/2.7:
>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape('©')
u'\xa9'