带有嵌套数组的Scrapy [英] Scrapy with a nested array

查看:91
本文介绍了带有嵌套数组的Scrapy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是scrapy的新手,并且想了解如何在对象上抓取以输出到嵌套JSON.现在,我正在生成看起来像

I'm new to scrapy and would like to understand how to scrape on object for output into nested JSON. Right now, I'm producing JSON that looks like

[
{'a' : 1, 
'b' : '2',
'c' : 3},
]

我更喜欢这样:

[
{ 'a' : '1',
'_junk' : [
     'b' : 2,
     'c' : 3]},
]

---我在_junk子字段中放置了一些东西,以便稍后进行后处理.

---where I put some stuff in _junk subfields to post-process later.

我的scrapername.py中的解析器定义文件下的当前代码是...

The current code under the parser definition file in my scrapername.py is...

item['a'] = x
item['b'] = y
item['c'] = z

似乎

item['a'] = x
item['_junk']['b'] = y
item['_junk']['c'] = z

-可能会解决此问题,但我在_junk键方面遇到了错误:

---might fix that, but I'm getting an error about the _junk key:

  File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 49, in __getitem__
    return self._values[key]
exceptions.KeyError: '_junk'

这是否意味着我需要以某种方式更改我的items.py?目前,我有:

Does this mean I need to change my items.py somehow? Currently I have:

class Website(Item):
    a = Field()
    _junk = Field()
    b = Field()
    c = Field()

推荐答案

您需要先创建垃圾字典,然后才能在其中存储项目.

You need to create the junk dictionary before storing items in it.

item['a'] = x
item['_junk'] = {}
item['_junk']['b'] = y
item['_junk']['c'] = z

这篇关于带有嵌套数组的Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆