Scrapy 不下载图像并出现管道错误 [英] Scrapy not downloading images and getting pipeline error

查看:41
本文介绍了Scrapy 不下载图像并出现管道错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个代码

class MyImagesPipeline(ImagesPipeline):def get_media_requests(self, item, info):对于 item['image_urls'] 中的 image_url:产量请求(image_url)

这是从 BaseSpider 子类化的蜘蛛.这个基地蜘蛛让我做噩梦

def 解析(自我,响应):hxs = HtmlXPathSelector(响应)sites = hxs.select('//strong[@class="genmed"]')项目 = []对于站点 [:5] 中的站点:item = PanduItem()item['username'] = site.select('dl/dd/h2/a').select("string()").extract()item['number_posts'] = site.select('dl/dd/h2/em').select("string()").extract()item['profile_link'] = site.select('a/@href').extract()request = Request("http://www.example/profile.php?mode=viewprofile&u=5",回调 = self.parseUserProfile)request.meta['item'] = item退货要求

<小时>

 def parseUserProfile(self, response):hxs = HtmlXPathSelector(响应)sites = hxs.select('//div[@id="current')myurl = sites[0].select('img/@src').extract()item = response.meta['item']image_absolute_url = urljoin(response.url, myurl[0].strip())item['image_urls'] = [image_absolute_url]归还物品

<小时>

这是我得到的错误.我无法找到.看起来它正在获取项目,但我不确定

错误文件/app_crawler/crawler/pipelines.py",第 9 行,在 get_media_requests对于 item['image_urls'] 中的 image_url:exceptions.TypeError: 'NoneType' 对象没有属性 '__getitem__'

解决方案

您的 pipelines.py 中缺少一个方法所述文件包含3个方法:

  • 处理项目
  • get_media_requests
  • item_completed

item_completed 方法是处理将图像保存到指定路径的方法.这个路径在settings.py中设置如下:

ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']IMAGES_STORE = '/你的/路径/这里'

如上所示,settings.py 中还包含启用图像管道的行.

我尽量以我理解的最好的方式解释它.如需进一步参考,请查看官方 scrapy 文档.>

I have this code

class MyImagesPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
            for image_url in item['image_urls']:
                yield Request(image_url)

and this is the spider subclassed from BaseSpider. This basespider is giving me nightmare

def parse(self, response):

    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//strong[@class="genmed"]')
    items = []


    for site in sites[:5]:

        item = PanduItem()
        item['username'] = site.select('dl/dd/h2/a').select("string()").extract()
        item['number_posts'] = site.select('dl/dd/h2/em').select("string()").extract()
        item['profile_link'] = site.select('a/@href').extract()



        request =  Request("http://www.example/profile.php?mode=viewprofile&u=5",
        callback = self.parseUserProfile)
        request.meta['item'] = item
        return request


 def parseUserProfile(self, response):

        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//div[@id="current')
        myurl = sites[0].select('img/@src').extract()

        item = response.meta['item']

        image_absolute_url = urljoin(response.url, myurl[0].strip())
        item['image_urls'] = [image_absolute_url]

        return item


This is the error i am getting. I am not able to find. Looks like its getting item but i am not sure

ERROR

File "/app_crawler/crawler/pipelines.py", line 9, in get_media_requests
            for image_url in item['image_urls']:
        exceptions.TypeError: 'NoneType' object has no attribute '__getitem__'

解决方案

You are missing a method in your pipelines.py The said file contains 3 methods:

  • Process item
  • get_media_requests
  • item_completed

The item_completed method is the one that handles the saving of the images to a specified path. This path is set in the settings.py as below:

ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES_STORE = '/your/path/here'

Also included in the settings.py as seen above is the line that enables the imagepipeline.

I've tried to explain it in the best way I understood it as possible. For further reference, have a look at the official scrapy documentation.

这篇关于Scrapy 不下载图像并出现管道错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆