bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml.你需要安装解析器库吗? [英] bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

查看:81
本文介绍了bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml.你需要安装解析器库吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

我终端上的上述输出.我使用的是 Mac OS 10.7.x.我有 Python 2.7.1,并按照 本教程 获得Beautiful Soup 和 lxml,它们都成功安装并使用单独的测试文件位于此处.在导致此错误的 Python 脚本中,我包含了这一行:从 pageCrawler 导入 comparePages在 pageCrawler 文件中,我包含了以下两行:从 bs4 导入 BeautifulSoup从 urllib2 导入 urlopen

The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line: from pageCrawler import comparePages And in the pageCrawler file I have included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

在找出问题所在以及如何解决问题方面的任何帮助将不胜感激.

Any help in figuring out what the problem is and how it can be solved would much be appreciated.

推荐答案

我怀疑这与 BS 用于读取 HTML 的解析器有关.他们文档在这里,但如果你像我一样(在 OSX 上)你可能会遇到一些需要做一些工作的事情:

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

您会注意到,在上面的 BS4 文档页面中,他们指出默认情况下 BS4 将使用 Python 内置的 HTML 解析器.假设您使用的是 OSX,Apple 捆绑的 Python 版本是 2.7.2,这对字符格式不宽松.我遇到了同样的问题,所以我升级了我的 Python 版本来解决它.在 virtualenv 中执行此操作将最大限度地减少对其他项目的干扰.

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

如果这样做听起来很痛苦,您可以切换到 LXML 解析器:

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml

然后尝试:

soup = BeautifulSoup(html, "lxml")

根据您的情况,这可能已经足够了.我发现这很烦人,足以保证升级我的 Python 版本.使用 virtualenv,您可以相当轻松地迁移您的软件包.

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.

这篇关于bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml.你需要安装解析器库吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆