Xpath 错误 - Spider 错误处理 [英] Xpath Error - Spider error processing

查看:50
本文介绍了Xpath 错误 - Spider 错误处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在构建这个蜘蛛并且它可以很好地爬行,因为我可以登录到 shell 并浏览 HTML 页面并测试我的 Xpath 查询.

So i am building this spider and it crawls fine, because i can log into the shell and go through the HTML page and test my Xpath queries.

不确定我做错了什么.任何帮助,将不胜感激.我已经重新安装了 Twisted,但没有安装.

Not sure what i am doing wrong. Any help would be appreciated. I have re installed Twisted, but nothing.

我的蜘蛛看起来像这样 -

My spider looks like this -

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from spider_scrap.items import spiderItem

class spider(BaseSpider):
name="spider1"
#allowed_domains = ["example.com"]
start_urls = [                  
              "http://www.example.com"
            ]

def parse(self, response):
 items = [] 
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//*[@id="search_results"]/div[1]/div')

    for site in sites:
        item = spiderItem()
        item['title'] = site.select('div[2]/h2/a/text()').extract                            item['author'] = site.select('div[2]/span/a/text()').extract    
        item['price'] = site.select('div[3]/div[1]/div[1]/div/b/text()').extract()     
    items.append(item)
    return items

当我运行蜘蛛 - scrapy crawl Spider1 我收到以下错误 -

When i run spider - scrapy crawl Spider1 i get the following error -

    2012-09-25 17:56:12-0400 [scrapy] DEBUG: Enabled item pipelines:
    2012-09-25 17:56:12-0400 [Spider1] INFO: Spider opened
    2012-09-25 17:56:12-0400 [Spider1] INFO: Crawled 0 pages (at 0 pages/min), scraped  0 items (at 0 items/min)
    2012-09-25 17:56:12-0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
    2012-09-25 17:56:12-0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
   2012-09-25 17:56:15-0400 [Spider1] DEBUG: Crawled (200) <GET http://www.example.com>  (refere
   r: None)
   2012-09-25 17:56:15-0400 [Spider1] ERROR: Spider error processing <GET    http://www.example.com
    s>
    Traceback (most recent call last):
      File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 1178, in mainLoop
        self.runUntilCurrent()
      File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 368, in callback
        self._startRunCallbacks(result)
      File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 464, in _startRunCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 551, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "C:\Python27\lib\site-packages\scrapy\spider.py", line 62, in parse
        raise NotImplementedError
    exceptions.NotImplementedError:

     2012-09-25 17:56:15-0400 [Spider1] INFO: Closing spider (finished)
     2012-09-25 17:56:15-0400 [Spider1] INFO: Dumping spider stats:
    {'downloader/request_bytes': 231,
     'downloader/request_count': 1,
     'downloader/request_method_count/GET': 1,
     'downloader/response_bytes': 186965,
     'downloader/response_count': 1,
     'downloader/response_status_count/200': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2012, 9, 25, 21, 56, 15, 326000),
     'scheduler/memory_enqueued': 1,
     'spider_exceptions/NotImplementedError': 1,
     'start_time': datetime.datetime(2012, 9, 25, 21, 56, 12, 157000)}
      2012-09-25 17:56:15-0400 [Spider1] INFO: Spider closed (finished)
      2012-09-25 17:56:15-0400 [scrapy] INFO: Dumping global stats:
    {}

推荐答案

对于遇到此问题的每个人,请确保您没有像我那样重命名 parse() 方法:

For everyone who faces this problem, please, make sure you didn't rename parse() method like I did:

class CakeSpider(CrawlSpider):
    name            = "cakes"
    allowed_domains = ["cakes.com"]
    start_urls      = ["http://www.cakes.com/catalog"]

    def parse(self, response): #this should be 'parse' and nothing else

        #yourcode#

否则它会抛出同样的错误:

Otherwise it throws the same error:

...
File "C:\Python27\lib\site-packages\scrapy\spider.py", line 62, in parse
    raise NotImplementedError
    exceptions.NotImplementedError:

我花了大约三个小时试图弄清楚-.-

I've spent like three hours trying to figure out -.-

这篇关于Xpath 错误 - Spider 错误处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆