Scrapy - 使用蜘蛛名称同时记录到文件和标准输出 [英] Scrapy - logging to file and stdout simultaneously, with spider names

查看:37
本文介绍了Scrapy - 使用蜘蛛名称同时记录到文件和标准输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我决定使用 Python 日志记录模块,因为 Twisted on std error 生成的消息太长,并且我想要 INFO 级别的有意义的消息,例如由 生成的消息StatsCollector 将写入单独的日志文件,同时维护屏幕上的消息.

I've decided to use the Python logging module because the messages generated by Twisted on std error is too long, and I want to INFO level meaningful messages such as those generated by the StatsCollector to be written on a separate log file while maintaining the on screen messages.

 from twisted.python import log
     import logging
     logging.basicConfig(level=logging.INFO, filemode='w', filename='buyerlog.txt')
     observer = log.PythonLoggingObserver()
     observer.start()

好吧,这很好,我有我的消息,但缺点是我不知道消息是由哪个蜘蛛生成的!这是我的日志文件,twisted"由 %(name)s 显示:

Well, this is fine, I've got my messages, but the downside is that I do not know the messages are generated by which spider! This is my log file, with "twisted" being displayed by %(name)s:

 INFO:twisted:Log opened.
  2 INFO:twisted:Scrapy 0.12.0.2543 started (bot: property)
  3 INFO:twisted:scrapy.telnet.TelnetConsole starting on 6023
  4 INFO:twisted:scrapy.webservice.WebService starting on 6080
  5 INFO:twisted:Spider opened
  6 INFO:twisted:Spider opened
  7 INFO:twisted:Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
  8 INFO:twisted:Closing spider (shutdown)
  9 INFO:twisted:Closing spider (shutdown)
 10 INFO:twisted:Dumping spider stats:
 11 {'downloader/exception_count': 3,
 12  'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 3,
 13  'downloader/request_bytes': 9973,

与扭曲标准错误生成的消息相比:

As compared to the messages generated from twisted on standard error:

2011-12-16 17:34:56+0800 [expats] DEBUG: number of rules: 4
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-12-16 17:34:56+0800 [iproperty] INFO: Spider opened
2011-12-16 17:34:56+0800 [iproperty] DEBUG: Redirecting (301) to <GET http://www.iproperty.com.sg/> from <GET http://iproperty.com.sg>
2011-12-16 17:34:57+0800 [iproperty] DEBUG: Crawled (200) <

我尝试过 %(name)s、%(module)s 等,但我似乎无法显示蜘蛛名称.有人知道答案吗?

I've tried %(name)s, %(module)s amongst others but I don't seem to be able to show the spider name. Does anyone knows the answer?

在设置中使用 LOG_FILELOG_LEVEL 的问题是在 std 错误中不会显示较低级别的消息.

the problem with using LOG_FILE and LOG_LEVEL in settings is that the lower level messages will not be shown on std error.

推荐答案

您想使用 ScrapyFileLogObserver.

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()

很高兴你问这个问题,我一直想自己做这个.

I'm glad you asked this question, I've been wanting to do this myself.

这篇关于Scrapy - 使用蜘蛛名称同时记录到文件和标准输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆