在多个蜘蛛完成抓取后使用 Scrapy 发送电子邮件警报 [英] Send email alert using Scrapy after multiple spiders have finished crawling

查看:34
本文介绍了在多个蜘蛛完成抓取后使用 Scrapy 发送电子邮件警报的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是想知道实现这一点的最佳方法是什么.我有 2 只蜘蛛,我想根据 2 只蜘蛛爬完后抓取的内容发送电子邮件警报.

Just wondering what is the best way to implement this. I have 2 spiders and I want to send an email alert depending on what is scraped after the 2 spiders have finished crawling.

我正在使用基于教程的脚本来像这样运行两个蜘蛛:

I'm using a script based on the tutorial to run both spiders like so:

if __name__ == "__main__":
    process = CrawlerProcess(get_project_settings())
    process.crawl(NqbpSpider)
    process.crawl(GladstoneSpider)
    process.start() # the script will block here until the crawling is finished

最好在 process.start() 之后调用 email 函数还是在 close_spider 函数下的 pipelines.py 文件中编写一个 email 函数

Is it best to call an email function after process.start() or to code up an email function in the pipelines.py file under the close_spider function

def close_spider(self, spider):

推荐答案

你可以使用这个

from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher

class MySpider(CrawlSpider):
    def __init__(self):
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider):
      # second param is the instance of the spider about to be closed.
      #  Write the mail sending part here

如果您想在邮件中包含抓取的数据,请在 pipelines.py 文件中编写脚本.

If you want to include the scraped data with the mail, write the script in the pipelines.py file.

class MyPipeline(Pipeline):
    spider = None

    def process_item(self, item, spider):
        if spider.name == 'Name of the spider':
            # Use the data and send the mail from here
        return item

这篇关于在多个蜘蛛完成抓取后使用 Scrapy 发送电子邮件警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆