如何从Scrapy生成自定义JSON输出? [英] How to produce custom JSON output from Scrapy?
问题描述
我正在开发一个Scrapy脚本,该脚本应该产生如下输出:
I am working on a Scrapy script which should make output like:
{
"state": "FL",
"date": "2017-11-03T14:52:26.007Z",
"games": [
{
"name":"Game1"
},
{
"name":"Game2"
}
]
}
但是对我来说,当我运行scrapy crawl items -o data.json -t json
时,它的显示如下. state
But for me it is making as below when I run scrapy crawl items -o data.json -t json
. The repetition of state
[
{"state": "CA", "games": [], "crawlDate": "2014-10-04"},
{"state": "CA", "games": [], "crawlDate": "2014-10-04"},
]
代码如下:
进口沙皮
items.py
class Item(scrapy.Item):
state = scrapy.Field()
games = scrapy.Field()
在Spider文件中,item
类称为:
In Spider file, item
class is called as:
item = Item()
item['state'] = state
item['Date'] = '2014-10-04'
item['games'] = games
我知道这不是完整的代码,但是应该可以使我了解所有内容.
I know this is not complete code but it should give an idea what I am all about.
推荐答案
参考. https://stackoverflow.com/a/43698923/8964297
您可以尝试这样编写自己的管道:
You could try to write your own pipeline like this:
将其放入您的pipelines.py
文件:
import json
class JsonWriterPipeline(object):
def open_spider(self, spider):
self.file = open('scraped_items.json', 'w')
# Your scraped items will be saved in the file 'scraped_items.json'.
# You can change the filename to whatever you want.
self.file.write("[")
def close_spider(self, spider):
self.file.write("]")
self.file.close()
def process_item(self, item, spider):
line = json.dumps(
dict(item),
indent = 4,
sort_keys = True,
separators = (',', ': ')
) + ",\n"
self.file.write(line)
return item
然后修改您的settings.py
以包括以下内容:
Then modify your settings.py
to include the following:
ITEM_PIPELINES = {
'YourSpiderName.pipelines.JsonWriterPipeline': 300,
}
将YourSpiderName
更改为蜘蛛的正确名称.
Change YourSpiderName
to the correct name of your spider.
请注意,文件是由管道直接写入的,因此您不必使用-o
和-t
命令行参数来指定文件和格式.
Note that the file gets written directly by the pipeline, so you don't have to specify file and format with the -o
and -t
command line parameters.
希望这可以使您更接近所需的东西.
Hope this gets you closer to what you need.
这篇关于如何从Scrapy生成自定义JSON输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!