依次运行多个 Spider [英] Run Multiple Spider sequentially

查看:43
本文介绍了依次运行多个 Spider的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Class Myspider1
#do something....

Class Myspider2
#do something...

以上是我的spider.py文件的架构.我试图先运行 Myspider1,然后根据某些条件多次运行 Myspider2.我怎么能这样???有小费吗?

The above is the architecture of my spider.py file. and i am trying to run the Myspider1 first and then run the Myspider2 multiples times depend on some conditions. How Could I do that??? any tips?

configure_logging()
runner = CrawlerRunner()
def crawl():
    yield runner.crawl(Myspider1,arg.....)
    yield runner.crawl(Myspider2,arg.....)
crawl()
reactor.run()

我正在尝试使用这种方式.但不知道如何运行它.我应该在 cmd 上运行 cmd(什么命令?)还是只运行 python 文件??

I am trying to use this way.but have no idea how to run it. Should I run the cmd on the cmd(what commands?) or just run the python file??

非常感谢!!!

推荐答案

需要使用 process.crawl() 返回的 Deferred 对象,它可以让你在爬行时添加回调完成.

You need to use the Deferred object returned by process.crawl(), which allows you to add a callback when the crawl is finished.

这是我的代码

def start_sequentially(process: CrawlerProcess, crawlers: list):
    print('start crawler {}'.format(crawlers[0].__name__))
    deferred = process.crawl(crawlers[0])
    if len(crawlers) > 1:
        deferred.addCallback(lambda _: start_sequentially(process, crawlers[1:]))

def main():
    crawlers = [Crawler1, Crawler2]
    process = CrawlerProcess(settings=get_project_settings())
    start_sequentially(process, crawlers)
    process.start()

这篇关于依次运行多个 Spider的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆