如何在scrapy中实现嵌套项目? [英] how to implement nested item in scrapy?

查看:632
本文介绍了如何在scrapy中实现嵌套项目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在抓取一些具有复杂层次信息的数据,并且需要将结果导出到json.

I am scraping some data with complex hierarchical info and need to export the result to json.

我将项目定义为

class FamilyItem():
    name = Field()
    sons = Field()

class SonsItem():
    name = Field()
    grandsons = Field()

class GrandsonsItem():
    name = Field()
    age = Field()
    weight = Field()
    sex = Field()

当蜘蛛运行完毕后,我会得到类似打印的输出

and when the spider runs complete, I will get a printed item output like

{'name': 'Jenny',
   'sons': [
            {'name': u'S1',
             'grandsons': [
                   {'name': u'GS1',
                    'age': 18,
                    'weight': 50
                   },
                   {
                    'name':u'GS2',
                    'age': 19,
                    'weight':51}]
                   }]
}

,但是当我运行scrapy crawl myscaper -o a.json时,总是说结果不是JSON可序列化的".然后我将项目输出复制并粘贴到ipython控制台中并使用json.dumps(),它可以正常工作,那么问题出在哪里呢?这让我发疯...

but when I run scrapy crawl myscaper -o a.json, it always says the result "is not JSON serializable". Then I copy and paste the item output into ipython console and use json.dumps(), it works fine.So where is the problem? this is driving my nuts...

推荐答案

保存嵌套项目时,请确保将其包装在对dict()的调用中,例如:

When saving the nested items, make sure to wrap them in a call to dict(), e.g.:

gs1 = GrandsonsItem()
gs1['name'] = 'GS1'
gs1['age'] = 18
gs1['weight'] = 50

gs2 = GrandsonsItem()
gs2['name'] = 'GS2'
gs2['age'] = 19
gs2['weight'] = 51

s1 = SonsItem()
s1['name'] = 'S1'
s1['grandsons'] = [dict(gs1), dict(gs2)]

jenny = FamilyItem()
jenny['name'] = 'Jenny'
jenny['sons'] = [dict(s1)]

这篇关于如何在scrapy中实现嵌套项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆