抑制管道后打印在日志中的 Scrapy 项目 [英] suppress Scrapy Item printed in logs after pipeline

查看:23
本文介绍了抑制管道后打印在日志中的 Scrapy 项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个scrapy项目,其中最终进入我的管道的项目相对较大,并存储了大量元数据和内容.在我的蜘蛛和管道中一切正常.然而,日志在离开管道时打印出整个scrapy Item(我相信):

I have a scrapy project where the item that ultimately enters my pipeline is relatively large and stores lots of metadata and content. Everything is working properly in my spider and pipelines. The logs, however, are printing out the entire scrapy Item as it leaves the pipeline (I believe):

2013-01-17 18:42:17-0600 [tutorial] DEBUG: processing Pipeline pipeline module
2013-01-17 18:42:17-0600 [tutorial] DEBUG: Scraped from <200 http://www.example.com>
    {'attr1': 'value1',
     'attr2': 'value2',
     'attr3': 'value3',
     ...
     snip
     ...
     'attrN': 'valueN'}
2013-01-17 18:42:18-0600 [tutorial] INFO: Closing spider (finished)

如果可以避免的话,我宁愿不将所有这些数据都放入日志文件中.有关如何抑制此输出的任何建议?

I would rather not have all this data puked into log files if I can avoid it. Any suggestions about how to suppress this output?

推荐答案

另一种方法是覆盖 Item 子类的 __repr__ 方法来选择性地选择哪些属性(如果any) 在管道末尾打印:

Another approach is to override the __repr__ method of the Item subclasses to selectively choose which attributes (if any) to print at the end of the pipeline:

from scrapy.item import Item, Field
class MyItem(Item):
    attr1 = Field()
    attr2 = Field()
    # ...
    attrN = Field()

    def __repr__(self):
        """only print out attr1 after exiting the Pipeline"""
        return repr({"attr1": self.attr1})

这样,您可以将日志级别保持在 DEBUG 并仅显示您希望从管道中看到的属性(例如检查 attr1).

This way, you can keep the log level at DEBUG and show only the attributes that you want to see coming out of the pipeline (to check attr1, for example).

这篇关于抑制管道后打印在日志中的 Scrapy 项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆