如何记录从脚本运行的抓爬虫 [英] How to log scrapy spiders running from script
问题描述
大家好,我有多个从脚本运行的蜘蛛.脚本每天安排一次.
- 我想分别记录信息和错误.日志文件名必须为 spider_infolog_ [date] 和 spider_errlog_ [date] 我正在尝试遵循以下代码,
蜘蛛__init__文件
from twisted.python import log
import logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
observer = log.PythonLoggingObserver()
observer.start()
在蜘蛛内:
import logging
.
.
.
logging.error(message)
-
如果蜘蛛代码中发生任何异常(例如我正在从MysqlDB中获取起始URL,如果连接失败,我需要关闭特定的蜘蛛而不是其他蜘蛛,因为我正在从脚本中运行所有蜘蛛)
>
提高CloseSpider(消息)
上面的代码足以关闭特定的蜘蛛吗?
编辑@eLRuLL
import logging
from scrapy.utils.log import configure_logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
configure_logging()
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
我已将以上代码放在安排蜘蛛程序的脚本中.无法创建未工作文件,但在控制台中却收到日志消息.
编辑2
我在configure_logging()中添加了install_root_handler = False,它使spider.log文件中的所有控制台输出都没有区别.
configure_logging(install_root_handler=False)
您可以执行以下操作:
from scrapy import cmdline
cmdline.execute("scrapy crawl myspider --logfile mylog.log".split())
将该脚本放在您放置scrapy.cfg
Hi all i have multiple spider running from the script. Script will schedule daily once.
- I want to log the infos, errors separately. log filename must be a spider_infolog_[date] and spider_errlog_[date] i am trying following code,
spider __init__ file
from twisted.python import log
import logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
observer = log.PythonLoggingObserver()
observer.start()
within spider:
import logging
.
.
.
logging.error(message)
if any exception happens in spider code [like i am fetching start urls from the MysqlDB, if the connection fails i need to close the specific spider not other spiders because i am running all spiders from the script]
raise CloseSpider(message)
is above code sufficent to close the particular spider ?
EDIT @eLRuLL
import logging
from scrapy.utils.log import configure_logging
LOG_FILE = 'logs/spider.log'
ERR_FILE = 'logs/spider_error.log'
configure_logging()
logging.basicConfig(level=logging.INFO, filemode='w+', filename=LOG_FILE)
logging.basicConfig(level=logging.ERROR, filemode='w+', filename=ERR_FILE)
i have put the above code in a script that schedules spiders. not working file not created but in console i got log messages.
EDIT 2
i have added install_root_handler=False in configure_logging() it gives all the console output in spider.log file error is not differenciated.
configure_logging(install_root_handler=False)
You can do this:
from scrapy import cmdline
cmdline.execute("scrapy crawl myspider --logfile mylog.log".split())
Put that script in the path where you put scrapy.cfg
这篇关于如何记录从脚本运行的抓爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!