如何让scrapy按顺序处理url [英] How can i make scrapy to process the url sequentially

查看：185 发布时间：2021/7/16 22:23:05 python scrapy

本文介绍了如何让scrapy按顺序处理url的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这个代码

def parse(self, response):

    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//div[@class="headline_area"]')
    items = []

    for site in sites[:5]:
        item = StackItem()
        log.msg(' LOOP' +str(ivar)+ '', level=log.ERROR)
        item['title'] ="yoo ma"
        request =  Request("blabla",  callback=self.test1)
        request.meta['item'] = item
        page_number = nextlink.split("&")[-1].split("=")[-1]
        if int(page_number) > 500:
           raise CloseSpider('Search Exceeded 500')
        ivar = ivar + 1
        yield request

        mylinks= soup.find_all('a')

        if mylinks:
            nextlink = mylinks[0].get('href')
            page_number = nextlink.split("&")[-3].split("=")[-1]
            request =  Request(urljoin(response.url, nextlink), callback=self.parse)
            request.meta['page'] = page_number
            yield request

现在我的问题是假设我想停在 page_number = 5

Now my problem is that suppose i want to stop at page_number = 5

现在，scrapy 在第 1 页、第 2 页等的所有项目被下载并在第一次到达那里时停止之前转到该页面.

now scrappy goes to that page before the all items from page 1 , page 2 etc are downloaded and stops when it first reaches there.

如何摆脱在转到page = 5

如何让scrapy按顺序处理url [英] How can i make scrapy to process the url sequentially

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何让scrapy按顺序处理url [英] How can i make scrapy to process the url sequentially

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭