从带有文件输出的脚本运行 Scrapy [英] Running Scrapy from a script with file output

查看:38
本文介绍了从带有文件输出的脚本运行 Scrapy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用带有以下命令行参数的 Scrapy:

I'm currently using Scrapy with the following command line arguments:

scrapy crawl my_spider -o data.json

但是,我更喜欢在 Python 脚本中保存"这个命令.遵循 https://doc.scrapy.org/en/latest/topics/practices.html,我有以下脚本:

However, I'd prefer to 'save' this command in a Python script. Following https://doc.scrapy.org/en/latest/topics/practices.html, I have the following script:

import scrapy
from scrapy.crawler import CrawlerProcess

from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(ApkmirrorSitemapSpider)
process.start() # the script will block here until the crawling is finished

但是,我从文档中不清楚 -o data.json 命令行参数在脚本中应该是什么.如何让脚本生成 JSON 文件?

However, it is unclear to me from the documentation what the equivalent of the -o data.json command line argument should be within the script. How can I make the script generate a JSON file?

推荐答案

您需要将 FEED_FORMATFEED_URI 添加到您的 CrawlerProcess 中:

You need to add the FEED_FORMAT and FEED_URI to your CrawlerProcess:

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'json',
'FEED_URI': 'data.json'
})

这篇关于从带有文件输出的脚本运行 Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆