BeautifulSoup' 没有属性 'HTML_ENTITIES [英] BeautifulSoup' has no attribute 'HTML_ENTITIES
问题描述
我最近在一台 Windows 机器上将 BeautifulSoup 从 3.0 版升级到了 4.1 版.
I have recently upgrade BeautifulSoup from version 3.0 to version 4.1 on a Windows machine.
我现在收到一个奇怪的错误:
I am now getting a strange error:
File "C:path omyscript.py", line 23
0, in soupify
return BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
AttributeError: type object 'BeautifulSoup' has no attribute 'HTML_ENTITIES'
这是导致抛出异常的代码片段:
Here is the snippet of code that causes the exception to be thrown:
def soupify(html):
return BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
BS 的文档没有提到构造函数签名是如何从 v3 更改为 v4 的.我该如何解决这个问题?
The doc for BS does not mention how the constructor signature has changed fro v3 to v4. How may I fix this?
推荐答案
传入的 HTML 或 XML 实体总是被转换为对应的 Unicode 字符.Beautiful Soup 3 有很多处理实体的重叠方式,已删除.BeautifulSoup 构造函数不再识别 smartQuotesTo或 convertEntities 参数.(Unicode,该死的仍有smart_quotes_to,但它的默认值现在是将智能引号变成Unicode.)
An incoming HTML or XML entity is always converted into the corresponding Unicode character. Beautiful Soup 3 had a number of overlapping ways of dealing with entities, which have been removed. The BeautifulSoup constructor no longer recognizes the smartQuotesTo or convertEntities arguments. (Unicode, Dammit still has smart_quotes_to, but its default is now to turn smart quotes into Unicode.)
如果你想把那些 Unicode 字符转回 HTML 实体在输出上,而不是将它们转换为 UTF-8 字符,您需要使用 输出格式化程序.
If you want to turn those Unicode characters back into HTML entities on output, rather than turning them into UTF-8 characters, you need to use an output formatter.
来源:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#entities
这篇关于BeautifulSoup' 没有属性 'HTML_ENTITIES的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!