Scrapy restrict_xpath 语法错误 [英] Scrapy restrict_xpath syntax error

查看:36
本文介绍了Scrapy restrict_xpath 语法错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将 Scrapy 限制为特定 XPath 位置以进行以下链接.XPath 是正确的(根据适用于 chrome 的 XPath Helper 插件),但是当我运行 Crawl Spider 时,我的规则出现语法错误.

I'm trying to limit Scrapy to a particular XPath location for following links. The XPath is correct (according to XPath Helper plugin for chrome), but when I run my Crawl Spider I get a syntax error at my Rule.

我的蜘蛛代码是:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from tutorial.items import BassItem

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()


class BassSpider(CrawlSpider):
    name = "bass"
    allowed_domains = ["talkbass.com"]
    start_urls = ["http://www.talkbass.com/forum/f126"]


    rules = [Rule(SgmlLinkExtractor(allow=['/f126/index*']), callback='parse_item', follow=True, restrict_xpaths=('//a[starts-with(@title,"Next ")]')]


    def parse_item(self, response):

        hxs = HtmlXPathSelector(response)


        ads = hxs.select('//table[@id="threadslist"]/tbody/tr/td[@class="alt1"][2]/div')
        items = []
        for ad in ads:
            item = BassItem()
            item['title'] = ad.select('a/text()').extract()
            item['link'] = ad.select('a/@href').extract()
            items.append(item)
        return items

因此在规则内部,XPath '//a[starts-with(@title,"Next ")]' 返回错误,我不知道为什么,因为实际的 XPath 是有效的.我只是想让蜘蛛抓取每个下一页"链接.谁能帮我吗.如果您需要我的代码的任何其他部分的帮助,请告诉我.

So inside the rule, the XPath '//a[starts-with(@title,"Next ")]' is returning an error and I'm not sure why, since the actual XPath is valid. I'm simply trying to get the spider to crawl each "Next Page" link. Can anyone help me out. Please let me know if you need any other parts of my code for help.

推荐答案

问题不是 xpath,而是完整规则的语法不正确.以下规则修复了语法错误,但应检查以确保它正在执行所需的操作:

It's not the xpath that is the issue, rather that the syntax of the complete rule is incorrect. The following rule fixes the syntax error, but should be checked to make sure that it is doing what is required:

rules = (Rule(SgmlLinkExtractor(allow=['/f126/index*'], restrict_xpaths=('//a[starts-with(@title,"Next ")]')), 
        callback='parse_item', follow=True, ),
)

一般来说,强烈建议发布问题中的实际错误,因为错误和实际错误的感知可能会有所不同.

As a general point, posting the actual error in a question is highly recommended since the perception of the error and the actual error may well differ.

这篇关于Scrapy restrict_xpath 语法错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆