带有嵌套数组的Scrapy [英] Scrapy with a nested array
问题描述
我是scrapy的新手,并且想了解如何在对象上抓取以输出到嵌套JSON.现在,我正在生成看起来像
I'm new to scrapy and would like to understand how to scrape on object for output into nested JSON. Right now, I'm producing JSON that looks like
[
{'a' : 1,
'b' : '2',
'c' : 3},
]
我更喜欢这样:
[
{ 'a' : '1',
'_junk' : [
'b' : 2,
'c' : 3]},
]
---我在_junk
子字段中放置了一些东西,以便稍后进行后处理.
---where I put some stuff in _junk
subfields to post-process later.
我的scrapername.py
中的解析器定义文件下的当前代码是...
The current code under the parser definition file in my scrapername.py
is...
item['a'] = x
item['b'] = y
item['c'] = z
似乎
item['a'] = x
item['_junk']['b'] = y
item['_junk']['c'] = z
-可能会解决此问题,但我在_junk
键方面遇到了错误:
---might fix that, but I'm getting an error about the _junk
key:
File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 49, in __getitem__
return self._values[key]
exceptions.KeyError: '_junk'
这是否意味着我需要以某种方式更改我的items.py
?目前,我有:
Does this mean I need to change my items.py
somehow? Currently I have:
class Website(Item):
a = Field()
_junk = Field()
b = Field()
c = Field()
推荐答案
您需要先创建垃圾字典,然后才能在其中存储项目.
You need to create the junk dictionary before storing items in it.
item['a'] = x
item['_junk'] = {}
item['_junk']['b'] = y
item['_junk']['c'] = z
这篇关于带有嵌套数组的Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!