美丽汤中的此错误是什么意思? [英] What does this error in beautiful soup means?

查看:50
本文介绍了美丽汤中的此错误是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PyQt4和BeautifulSoup编写小脚本.基本上,您指定url,然后指定应该从网页下载所有图片的脚本.

I'm doing little script using PyQt4 and BeautifulSoup. Basically you specify url and than script supposed to download all pic's from web-page.

在输出中,当我提供 http://yahoo.com 时,它将下载除一张图片以外的所有图片:

In the output, when I provide http://yahoo.com it downloads all the pictures except one:

...
Download Complete
Download Complete
File name is wrong 
Traceback (most recent call last):
  File "./picture_downloader.py", line 41, in loadComplete
    self.download_image()
  File "./picture_downloader.py", line 58, in download_image
    print 'File name is wrong ',image['src']
  File "/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/element.py", line 879, in __getitem__
    return self.attrs[key]
KeyError: 'src'

来自 http://stackoverflow.com

输出是:

Download Complete
File name is wrong  h
Download Complete

最后,这是代码的一部分:

And finally , here is part of the code:

# SLOT for loadFinished
def loadComplete(self): 
    self.download_image()

def download_image(self):
    html = unicode(self.frame.toHtml()).encode('utf-8')
    soup = bs(html)

    for image in soup.findAll('img'):
        try:
            file_name = image['src'].split('/')[-1]
            cur_path = os.path.abspath(os.curdir)
            if not os.path.exists(os.path.join(cur_path, 'images/')):
                os.makedirs(os.path.join(cur_path, 'images/'))
            f_path = os.path.join(cur_path, 'images/%s' % file_name)
            urlretrieve(image['src'], f_path)
            print "Download Complete"
        except:
            print 'File name is wrong ',image['src']
    print "No more pictures on the page"

推荐答案

这意味着 image 元素没有"src" 属性,您将获得两次相同的错误:一次在 file_name = image ['src'].split('/')[-1] 中,然后在除外代码块'File name is error'中,image ['src'] .

This means that the image element doesn't have a "src" attribute, and you get the same error twice: once in file_name = image['src'].split('/')[-1] and after that in the except block 'File name is wrong ',image['src'].

避免此问题的最简单方法是将 soup.findAll('img')替换为 soup.findAll('img',{"src":True}),因此它只会找到具有 src 属性的元素.

The simplest way to avoid the problem would be to replace soup.findAll('img') with soup.findAll('img',{"src":True}) so it would only find the elements that have a src attribute.

如果有两种可能性,请尝试以下方法:

If there are two possibilities, try something like:

for image in soup.findAll('img'):
    v = image.get('src', image.get('dfr-src'))  # get's "src", else "dfr_src"
                                                # if both are missing - None
    if v is None:
        continue  # continue loop with the next image
    # do your stuff

这篇关于美丽汤中的此错误是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆