如何从Scrapy生成自定义JSON输出? [英] How to produce custom JSON output from Scrapy?

查看:215
本文介绍了如何从Scrapy生成自定义JSON输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个Scrapy脚本,该脚本应该产生如下输出:

I am working on a Scrapy script which should make output like:

{
  "state": "FL",
  "date": "2017-11-03T14:52:26.007Z",
  "games": [
    {
      "name":"Game1"
    },
    {
      "name":"Game2"
    }
  ]
}

但是对我来说,当我运行scrapy crawl items -o data.json -t json时,它的显示如下. state

But for me it is making as below when I run scrapy crawl items -o data.json -t json. The repetition of state

[
{"state": "CA", "games": [], "crawlDate": "2014-10-04"},
{"state": "CA", "games": [], "crawlDate": "2014-10-04"},
]

代码如下:

进口沙皮

items.py

class Item(scrapy.Item):
 state = scrapy.Field()
 games = scrapy.Field()

在Spider文件中,item类称为:

In Spider file, item class is called as:

item = Item()
item['state'] = state
item['Date'] = '2014-10-04'
item['games'] = games

我知道这不是完整的代码,但是应该可以使我了解所有内容.

I know this is not complete code but it should give an idea what I am all about.

推荐答案

参考. https://stackoverflow.com/a/43698923/8964297

您可以尝试这样编写自己的管道:

You could try to write your own pipeline like this:

将其放入您的pipelines.py文件:

import json


class JsonWriterPipeline(object):
    def open_spider(self, spider):
        self.file = open('scraped_items.json', 'w')
        # Your scraped items will be saved in the file 'scraped_items.json'.
        # You can change the filename to whatever you want.
        self.file.write("[")

    def close_spider(self, spider):
        self.file.write("]")
        self.file.close()

    def process_item(self, item, spider):
        line = json.dumps(
            dict(item),
            indent = 4,
            sort_keys = True,
            separators = (',', ': ')
        ) + ",\n"
        self.file.write(line)
        return item

然后修改您的settings.py以包括以下内容:

Then modify your settings.py to include the following:

ITEM_PIPELINES = {
    'YourSpiderName.pipelines.JsonWriterPipeline': 300,
}

YourSpiderName更改为蜘蛛的正确名称.

Change YourSpiderName to the correct name of your spider.

请注意,文件是由管道直接写入的,因此您不必使用-o-t命令行参数来指定文件和格式.

Note that the file gets written directly by the pipeline, so you don't have to specify file and format with the -o and -t command line parameters.

希望这可以使您更接近所需的东西.

Hope this gets you closer to what you need.

这篇关于如何从Scrapy生成自定义JSON输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆