从带有文件输出的脚本运行 Scrapy [英] Running Scrapy from a script with file output
问题描述
我目前正在使用带有以下命令行参数的 Scrapy:
I'm currently using Scrapy with the following command line arguments:
scrapy crawl my_spider -o data.json
但是,我更喜欢在 Python 脚本中保存"这个命令.遵循 https://doc.scrapy.org/en/latest/topics/practices.html,我有以下脚本:
However, I'd prefer to 'save' this command in a Python script. Following https://doc.scrapy.org/en/latest/topics/practices.html, I have the following script:
import scrapy
from scrapy.crawler import CrawlerProcess
from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(ApkmirrorSitemapSpider)
process.start() # the script will block here until the crawling is finished
但是,我从文档中不清楚 -o data.json
命令行参数在脚本中应该是什么.如何让脚本生成 JSON 文件?
However, it is unclear to me from the documentation what the equivalent of the -o data.json
command line argument should be within the script. How can I make the script generate a JSON file?
推荐答案
您需要将 FEED_FORMAT
和 FEED_URI
添加到您的 CrawlerProcess
中:
You need to add the FEED_FORMAT
and FEED_URI
to your CrawlerProcess
:
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'json',
'FEED_URI': 'data.json'
})
这篇关于从带有文件输出的脚本运行 Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!