如何从另一个 Python 脚本调用特定的 Scrapy 蜘蛛 [英] How to call particular Scrapy spiders from another Python script

查看:36
本文介绍了如何从另一个 Python 脚本调用特定的 Scrapy 蜘蛛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 algorithm.py 的脚本,我希望能够在脚本期间调用 Scrapy 蜘蛛.文件结构为:

I have a script called algorithm.py and I want to be able to call Scrapy spiders during the script. The file scructure is:

算法.py我的蜘蛛侠/

algorithm.py MySpiders/

其中 MySpiders 是一个包含多个scrapy 项目的文件夹.我想创建方法 perform_spider1()、perform_spider2()... 我可以在 algorithm.py 中调用它们.

where MySpiders is a folder containing several scrapy projects. I would like to create methods perform_spider1(), perform_spider2()... which I can call in algorithm.py.

我如何构建这个方法?

我设法使用以下代码调用了一个蜘蛛,但是,它不是一种方法,它只适用于一个蜘蛛.我是需要帮助的初学者!

I have managed to call one spider using the following code, however, it's not a method and it only works for one spider. I'm a beginner in need of help!

import sys,os.path
sys.path.append('path to spider1/spider1')
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log, signals
from scrapy.xlib.pydispatch import dispatcher
from spider1.spiders.spider1_spider import Spider1Spider

def stop_reactor():
    reactor.stop()

dispatcher.connect(stop_reactor, signal=signals.spider_closed)

spider = RaListSpider()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
log.msg('Running reactor...')
reactor.run() # the script will block here
log.msg('Reactor stopped.')

推荐答案

只需通过调用 configurecrawlstart 来检查蜘蛛并设置它们,然后才调用 log.start()reactor.run().而scrapy会在同一个进程中运行多个spider.

Just go through your spiders and set them up via calling configure, crawl and start, and only then call log.start() and reactor.run(). And scrapy will run multiple spiders in the same process.

有关更多信息,请参阅 文档此主题.

For more info see documentation and this thread.

另外,考虑通过 scrapyd 运行您的蜘蛛.

Also, consider running your spiders via scrapyd.

希望有所帮助.

这篇关于如何从另一个 Python 脚本调用特定的 Scrapy 蜘蛛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆