Scrapy 请求返回 notImplementedError [英] Scrapy request return notImplementedError

查看:48
本文介绍了Scrapy 请求返回 notImplementedError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的scrapy代码不起作用,我不知道!我想抓取宜家网站,我首先设计了一个 CrawlSpider,它不够具体,无法检索网页的每个链接.所以我设计了一个带有yield请求方法的基本Spider.

My scrapy code doesn't work and I have no clue ! I want to scrape the Ikea website, I designed first a CrawlSpider which was not specific enough to retrieve every links of the webpage. So I designed a basic Spider with yield request method.

这是我的代码:

class IkeaSpider(scrapy.Spider) :        
    name = "Ikea"
    allower_domains = ["http://www.ikea.com/"]
    start_urls = ["http://www.ikea.com/fr/fr/catalog/productsaz/8/"]



    def parse_url(self, response):

        for sel in response.xpath('//div[@id="productsAzLeft"]'):

            base_url = 'http://www.ikea.com/'
            follow_url = sel.xpath('//span[@class="productsAzLink"]/@href').extract()
            complete_url = urlparse.urljoin(base_url, follow_url)
            request = Request(complete_url, callback = self.parse_page)

            yield request


    def parse_page(self, response):

这里是错误日志:

2016-01-04 22:06:31 [scrapy] ERROR: Spider error processing <GET http://www.ikea.com/fr/fr/catalog/productsaz/8/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/__init__.py", line 76, in parse
    raise NotImplementedError
NotImplementedError

推荐答案

你的蜘蛛需要一个 parse 方法,它是所有初始请求的默认回调.您只需将 parse_url 方法重命名为 parse 就可以了.

Your spider needs a parse method which is the default callback for all initial requests. You can just rename the parse_url method to parse and it will work fine.

class IkeaSpider(scrapy.Spider) :

    name = "Ikea"
    allower_domains = ["http://www.ikea.com/"]
    start_urls = ["http://www.ikea.com/fr/fr/catalog/productsaz/8/"]


    def parse(self, response):

        for sel in response.xpath('//div[@id="productsAzLeft"]'):

            base_url = 'http://www.ikea.com/'
            follow_url = sel.xpath('//span[@class="productsAzLink"]/@href').extract()
            complete_url = urlparse.urljoin(base_url, follow_url)
            request = Request(complete_url, callback = self.parse_page)

            yield request

替代方案

您还可以定义一个 start_requests 方法并使用定义的 callback 参数手动生成 scrapy.Requests,就像您在此处所做的那样.

Alternatives

You can also define a start_requests method and yield scrapy.Requests manually with a defined callback argument just like you did here.

这篇关于Scrapy 请求返回 notImplementedError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆