bs4.FeatureNotFound:找不到具有您请求的功能的树生成器:lxml [英] bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml

查看:564
本文介绍了bs4.FeatureNotFound:找不到具有您请求的功能的树生成器:lxml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

能否请您建议解决方法?它几乎从imgur页面下载了所有图像,只有一个图像,不确定为什么在这种情况下不起作用以及如何解决?

Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it?

elif 'imgur.com' in submission.url and not (submission.url.endswith('gif')
                        or submission.url.endswith('webm')
                        or submission.url.endswith('mp4')
                        or 'all' in submission.url
                        or '#' in submission.url
                        or '/a/' in submission.url):
                html_source = requests.get(submission.url).text # download the image's page
                soup = BeautifulSoup(html_source, "lxml")
                image_url = soup.select('img')[0]['src']
                if image_url.startswith('//'):
                image_url = 'http:' + image_url
                image_id = image_url[image_url.rfind('/') + 1:image_url.rfind('.')]
                try:
                image_file = urllib2.urlopen(image_url, timeout = 5)
                with open('/home/mona/computer_vision/image_retrieval/images/'+ category+ '/'+ 'imgur_'+ datetime.datetime.now().strftime('%y-%m-%d-%s') + image_url[-9:], 'wb') as output_image:
                        output_image.write(image_file.read())
                        except urllib2.URLError as e:
                        print(e)
                        continue

错误是:

[LOG] Done Getting http://i.imgur.com/FoCjtI7.jpg
submission id is: 1alffm
[LOG] Getting url:  http://sphotos-a.ak.fbcdn.net/hphotos-ak-ash4/217834_10151246341237704_484810759_n.jpg
HTTP Error 403: Forbidden
[LOG] Getting url:  http://imgur.com/xp386
Traceback (most recent call last):
  File "download_images.py", line 67, in <module>
    soup = BeautifulSoup(html_source, "lxml")
  File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 155, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

推荐答案

打开python shell并尝试以下操作:

Open a python shell and try the following:

from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")

这行得通,还是相同的错误?如果有同样的错误,则说明您缺少lxml.安装它:

Does that work, or same error? If same error, you're missing lxml. Install it:

pip install lxml

我正在执行这些步骤,因为您指出该脚本在崩溃前已经工作了一段时间,在这种情况下,您不能缺少解析器吗?

I'm going through the steps because you indicate that the script works for a good while before crashing, in which case, you can't be missing the parser?

由OP添加:

If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml 

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml

这篇关于bs4.FeatureNotFound:找不到具有您请求的功能的树生成器:lxml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆