BeautifulSoup' 没有属性 'HTML_ENTITIES [英] BeautifulSoup' has no attribute 'HTML_ENTITIES

查看:13
本文介绍了BeautifulSoup' 没有属性 'HTML_ENTITIES的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在一台 Windows 机器上将 BeautifulSoup 从 3.0 版升级到了 4.1 版.

I have recently upgrade BeautifulSoup from version 3.0 to version 4.1 on a Windows machine.

我现在收到一个奇怪的错误:

I am now getting a strange error:

File "C:path	omyscript.py", line 23
0, in soupify
    return BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
AttributeError: type object 'BeautifulSoup' has no attribute 'HTML_ENTITIES'

这是导致抛出异常的代码片段:

Here is the snippet of code that causes the exception to be thrown:

def soupify(html):
    return BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)

BS 的文档没有提到构造函数签名是如何从 v3 更改为 v4 的.我该如何解决这个问题?

The doc for BS does not mention how the constructor signature has changed fro v3 to v4. How may I fix this?

推荐答案

传入的 HTML 或 XML 实体总是被转换为对应的 Unicode 字符.Beautiful Soup 3 有很多处理实体的重叠方式,已删除.BeautifulSoup 构造函数不再识别 smartQuotesTo或 convertEntities 参数.(Unicode,该死的仍有smart_quotes_to,但它的默认值现在是将智能引号变成Unicode.)

An incoming HTML or XML entity is always converted into the corresponding Unicode character. Beautiful Soup 3 had a number of overlapping ways of dealing with entities, which have been removed. The BeautifulSoup constructor no longer recognizes the smartQuotesTo or convertEntities arguments. (Unicode, Dammit still has smart_quotes_to, but its default is now to turn smart quotes into Unicode.)

如果你想把那些 Unicode 字符转回 HTML 实体在输出上,而不是将它们转换为 UTF-8 字符,您需要使用 输出格式化程序.

If you want to turn those Unicode characters back into HTML entities on output, rather than turning them into UTF-8 characters, you need to use an output formatter.

来源:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#entities

这篇关于BeautifulSoup' 没有属性 'HTML_ENTITIES的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆