如何从运行中获取统计信息? [英] How to get stats from a scrapy run?

查看:77
本文介绍了如何从运行中获取统计信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在按照scrapy docs中的示例从外部文件运行scrapy spider.我想获取Core API提供的统计信息,并在抓取完成后将其存储到mysql表中.

I am running the scrapy spider from external file as per the example in scrapy docs. I want to grab the stats provided by the Core API and store it to mysql table after the crawl is finished.

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from test.spiders.myspider import *
from scrapy.utils.project import get_project_settings
from test.pipelines import MySQLStorePipeline
import datetime

spider = MySpider()


def run_spider(spider):        
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()
    mysql_insert = MySQLStorePipeline()
        mysql_insert.cursor.execute(
            'insert into crawler_stats(sites_id, start_time,end_time,page_scraped,finish_reason) 
              values(%s,%s,%s, %s,%s)',
                  (1,datetime.datetime.now(),datetime.datetime.now(),100,'test'))

    mysql_insert.conn.commit()

run_spider(spider)

如何获取上述代码中的start_time,end_time,pages_scraped,finish_reason等统计信息的值?

How can I get the values of stats like start_time, end_time, pages_scraped, finish_reason in the above code?

推荐答案

示例代码(在spider_closed信号处理程序中收集统计信息):

Example code (collecting stats in the spider_closed signal handler):

def callback(spider, reason):
    stats = spider.crawler.stats.get_stats()  # stats is a dictionary

    # write stats to the database here

    reactor.stop()


def run_spider(spider):        
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(callback, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()


run_spider(spider)

这篇关于如何从运行中获取统计信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆