如何记录从脚本运行的抓爬虫 [英] How to log scrapy spiders running from script

查看:156
本文介绍了如何记录从脚本运行的抓爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我有多个从脚本运行的蜘蛛.脚本每天安排一次.

  1. 我想分别记录信息和错误.日志文件名必须为 spider_infolog_ [date] spider_errlog_ [date] 我正在尝试遵循以下代码,

蜘蛛__init__文件

from twisted.python import log
import logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
observer = log.PythonLoggingObserver()
observer.start()

在蜘蛛内:

import logging
.
.
.
logging.error(message)

  1. 如果蜘蛛代码中发生任何异常(例如我正在从MysqlDB中获取起始URL,如果连接失败,我需要关闭特定的蜘蛛而不是其他蜘蛛,因为我正在从脚本中运行所有蜘蛛) >

    提高CloseSpider(消息)

上面的代码足以关闭特定的蜘蛛吗?

编辑@eLRuLL

import logging
from scrapy.utils.log import configure_logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
configure_logging()
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)

我已将以上代码放在安排蜘蛛程序的脚本中.无法创建未工作文件,但在控制台中却收到日志消息.

编辑2

我在configure_logging()中添加了install_root_handler = False,它使spider.log文件中的所有控制台输出都没有区别.

configure_logging(install_root_handler=False)

解决方案

您可以执行以下操作:

from scrapy import cmdline

cmdline.execute("scrapy crawl myspider --logfile mylog.log".split())

将该脚本放在您放置scrapy.cfg

的路径中

Hi all i have multiple spider running from the script. Script will schedule daily once.

  1. I want to log the infos, errors separately. log filename must be a spider_infolog_[date] and spider_errlog_[date] i am trying following code,

spider __init__ file

from twisted.python import log
import logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
observer = log.PythonLoggingObserver()
observer.start()

within spider:

import logging
.
.
.
logging.error(message)

  1. if any exception happens in spider code [like i am fetching start urls from the MysqlDB, if the connection fails i need to close the specific spider not other spiders because i am running all spiders from the script]

    raise CloseSpider(message)

is above code sufficent to close the particular spider ?

EDIT @eLRuLL

import logging
from scrapy.utils.log import configure_logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
configure_logging()
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)

i have put the above code in a script that schedules spiders. not working file not created but in console i got log messages.

EDIT 2

i have added install_root_handler=False in configure_logging() it gives all the console output in spider.log file error is not differenciated.

configure_logging(install_root_handler=False)

解决方案

You can do this:

from scrapy import cmdline

cmdline.execute("scrapy crawl myspider --logfile mylog.log".split())

Put that script in the path where you put scrapy.cfg

这篇关于如何记录从脚本运行的抓爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆