Scrapy 关注 &抓取下一页 [英] Scrapy Follow & Scrape next Pages
本文介绍了Scrapy 关注 &抓取下一页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我遇到了一个问题,我的爬虫蜘蛛都不会抓取网站,只会抓取一页并抓取.我的印象是 rules
成员变量对此负责,但我无法让它跟随任何链接.我一直在关注这里的文档:http://doc.scrapy.org/en/latest/topics/spiders.html#crawlspider
I am having a problem where none of my scrapy spiders will crawl a website, just scrape one page and seize. I was under the impression that the rules
member variable was responsible for this, but I can't get it to follow any links. I have been following the documentation from here: http://doc.scrapy.org/en/latest/topics/spiders.html#crawlspider
是什么让我的机器人不爬行?
What could I be missing that is making none of my bots crawl?
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.selector import Selector
from Example.items import ExItem
class ExampleSpider(CrawlSpider):
name = "example"
allowed_domains = ["example.ac.uk"]
start_urls = (
'http://www.example.ac.uk',
)
rules = ( Rule (LinkExtractor(allow=("", ),),
callback="parse_items", follow= True),
)
推荐答案
用这个替换你的规则:
rules = ( Rule(LinkExtractor(allow=('course-finder', ),restrict_xpaths=('//div[@class="pagination"]',)), callback='parse_items',follow=True), )
这篇关于Scrapy 关注 &抓取下一页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文