如何在 Scrapy 中设置 Item.Field() 的默认值? [英] How to set the default value of an Item.Field() in Scrapy?
问题描述
我正在尝试抓取一个网站,该网站在页面与页面之间不显示相同的数据.我希望我的蜘蛛为其无法抓取的每个属性返回一个默认值.我知道这可以在项目声明中完成,如下所示:
I'm trying to scrape a website which does not display the same data from page to page. I'd like my spider to return a default value for each attribute it could not scrape. I know that this could be done in the item declaration like this :
class MyItem(scrapy.Item):
myfield = scrapy.Field(default='NULL')
但是,这种方法似乎不再起作用(我使用的是 Scrapy 1.3.0).如果我在未找到该值时尝试导出此特定字段,则会得到:
However, this method seems not to work anymore (I'm using Scrapy 1.3.0). If I try to export this particular field when the value has not been found I got :
KeyError: 'myfield'
有解决方法吗?
推荐答案
大约 4 年前从 Scrapy 中删除了对字段默认值的支持(我只是好奇你以前使用过哪个版本).根据 Pablo Hoffman 推荐的方法是通过管道使用默认值填充项目:
Support of default values for fields was removed from Scrapy about 4 years ago (I'm just curious about which version have you used previously). According to Pablo Hoffman recommended way is to populate items with default values through pipeline:
class DefaultValuesPipeline(object):
def process_item(self, item, spider):
item.setdefault('field1', 'value1')
item.setdefault('field2', 'value2')
# ...
return item
https://groups.google.com/d/msg/scrapy-users/-v1p5W41VDQ/0W9SIB07iDIJ
但是,您可以扩展默认的 Field 类来实现所需的行为.
However you can just extend default Field class to implement desired behavior.
这篇关于如何在 Scrapy 中设置 Item.Field() 的默认值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!