在多个蜘蛛完成抓取后使用 Scrapy 发送电子邮件警报 [英] Send email alert using Scrapy after multiple spiders have finished crawling
问题描述
只是想知道实现这一点的最佳方法是什么.我有 2 只蜘蛛,我想根据 2 只蜘蛛爬完后抓取的内容发送电子邮件警报.
Just wondering what is the best way to implement this. I have 2 spiders and I want to send an email alert depending on what is scraped after the 2 spiders have finished crawling.
我正在使用基于教程的脚本来像这样运行两个蜘蛛:
I'm using a script based on the tutorial to run both spiders like so:
if __name__ == "__main__":
process = CrawlerProcess(get_project_settings())
process.crawl(NqbpSpider)
process.crawl(GladstoneSpider)
process.start() # the script will block here until the crawling is finished
最好在 process.start() 之后调用 email 函数还是在 close_spider 函数下的 pipelines.py 文件中编写一个 email 函数
Is it best to call an email function after process.start() or to code up an email function in the pipelines.py file under the close_spider function
def close_spider(self, spider):
推荐答案
你可以使用这个
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
class MySpider(CrawlSpider):
def __init__(self):
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
# second param is the instance of the spider about to be closed.
# Write the mail sending part here
如果您想在邮件中包含抓取的数据,请在 pipelines.py 文件中编写脚本.
If you want to include the scraped data with the mail, write the script in the pipelines.py file.
class MyPipeline(Pipeline):
spider = None
def process_item(self, item, spider):
if spider.name == 'Name of the spider':
# Use the data and send the mail from here
return item
这篇关于在多个蜘蛛完成抓取后使用 Scrapy 发送电子邮件警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!