如何在 Scrapy 中设置 Item.Field() 的默认值? [英] How to set the default value of an Item.Field() in Scrapy?

查看:369
本文介绍了如何在 Scrapy 中设置 Item.Field() 的默认值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取一个网站,该网站在页面与页面之间不显示相同的数据.我希望我的蜘蛛为其无法抓取的每个属性返回一个默认值.我知道这可以在项目声明中完成,如下所示:

I'm trying to scrape a website which does not display the same data from page to page. I'd like my spider to return a default value for each attribute it could not scrape. I know that this could be done in the item declaration like this :

class MyItem(scrapy.Item):
     myfield = scrapy.Field(default='NULL')

但是,这种方法似乎不再起作用(我使用的是 Scrapy 1.3.0).如果我在未找到该值时尝试导出此特定字段,则会得到:

However, this method seems not to work anymore (I'm using Scrapy 1.3.0). If I try to export this particular field when the value has not been found I got :

KeyError: 'myfield'

有解决方法吗?

推荐答案

大约 4 年前从 Scrapy 中删除了对字段默认值的支持(我只是好奇你以前使用过哪个版本).根据 Pablo Hoffman 推荐的方法是通过管道使用默认值填充项目:

Support of default values for fields was removed from Scrapy about 4 years ago (I'm just curious about which version have you used previously). According to Pablo Hoffman recommended way is to populate items with default values through pipeline:

class DefaultValuesPipeline(object):

    def process_item(self, item, spider):
        item.setdefault('field1', 'value1')
        item.setdefault('field2', 'value2')
        # ...
        return item

https://groups.google.com/d/msg/scrapy-users/-v1p5W41VDQ/0W9SIB07iDIJ

但是,您可以扩展默认的 Field 类来实现所需的行为.

However you can just extend default Field class to implement desired behavior.

这篇关于如何在 Scrapy 中设置 Item.Field() 的默认值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆