如何记录从脚本运行的抓爬虫 [英] How to log scrapy spiders running from script

查看：156 发布时间：2020/5/3 8:27:47 python exception logging web-scraping scrapy

本文介绍了如何记录从脚本运行的抓爬虫的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，我有多个从脚本运行的蜘蛛.脚本每天安排一次.

我想分别记录信息和错误.日志文件名必须为 spider_infolog_ [date] 和 spider_errlog_ [date] 我正在尝试遵循以下代码，

蜘蛛__init__文件

from twisted.python import log
import logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
observer = log.PythonLoggingObserver()
observer.start()

在蜘蛛内:

import logging
.
.
.
logging.error(message)

如果蜘蛛代码中发生任何异常(例如我正在从MysqlDB中获取起始URL，如果连接失败，我需要关闭特定的蜘蛛而不是其他蜘蛛，因为我正在从脚本中运行所有蜘蛛) >
提高CloseSpider(消息)

上面的代码足以关闭特定的蜘蛛吗?

编辑@eLRuLL

import logging
from scrapy.utils.log import configure_logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
configure_logging()
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)

我已将以上代码放在安排蜘蛛程序的脚本中.无法创建未工作文件，但在控制台中却收到日志消息.

编辑2

我在configure_logging()中添加了install_root_handler = False，它使spider.log文件中的所有控制台输出都没有区别.

configure_logging(install_root_handler=False)

解决方案

您可以执行以下操作:

from scrapy import cmdline

cmdline.execute("scrapy crawl myspider --logfile mylog.log".split())

将该脚本放在您放置scrapy.cfg

的路径中

Hi all i have multiple spider running from the script. Script will schedule daily once.

I want to log the infos, errors separately. log filename must be a spider_infolog_[date] and spider_errlog_[date] i am trying following code,

spider __init__ file

from twisted.python import log
import logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
observer = log.PythonLoggingObserver()
observer.start()

within spider:

import logging
.
.
.
logging.error(message)

if any exception happens in spider code [like i am fetching start urls from the MysqlDB, if the connection fails i need to close the specific spider not other spiders because i am running all spiders from the script]

raise CloseSpider(message)

is above code sufficent to close the particular spider ?

EDIT @eLRuLL

import logging
from scrapy.utils.log import configure_logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
configure_logging()
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)

i have put the above code in a script that schedules spiders. not working file not created but in console i got log messages.

EDIT 2

i have added install_root_handler=False in configure_logging() it gives all the console output in spider.log file error is not differenciated.

configure_logging(install_root_handler=False)

解决方案

You can do this:

from scrapy import cmdline

cmdline.execute("scrapy crawl myspider --logfile mylog.log".split())

Put that script in the path where you put scrapy.cfg

这篇关于如何记录从脚本运行的抓爬虫的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何记录从脚本运行的抓爬虫 [英] How to log scrapy spiders running from script

问题描述

编辑@eLRuLL

编辑2

EDIT @eLRuLL

EDIT 2

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何记录从脚本运行的抓爬虫 [英] How to log scrapy spiders running from script

问题描述

编辑@eLRuLL

编辑2

EDIT @eLRuLL

EDIT 2

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭