从带有文件输出的脚本运行 Scrapy [英] Running Scrapy from a script with file output

查看：38 发布时间：2021/7/17 18:31:02 python scrapy

本文介绍了从带有文件输出的脚本运行 Scrapy的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用带有以下命令行参数的 Scrapy:

I'm currently using Scrapy with the following command line arguments:

scrapy crawl my_spider -o data.json

但是，我更喜欢在 Python 脚本中保存"这个命令.遵循 https://doc.scrapy.org/en/latest/topics/practices.html，我有以下脚本:

However, I'd prefer to 'save' this command in a Python script. Following https://doc.scrapy.org/en/latest/topics/practices.html, I have the following script:

import scrapy
from scrapy.crawler import CrawlerProcess

from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(ApkmirrorSitemapSpider)
process.start() # the script will block here until the crawling is finished

但是，我从文档中不清楚 -o data.json 命令行参数在脚本中应该是什么.如何让脚本生成 JSON 文件?

However, it is unclear to me from the documentation what the equivalent of the -o data.json command line argument should be within the script. How can I make the script generate a JSON file?

推荐答案

您需要将 FEED_FORMAT 和 FEED_URI 添加到您的 CrawlerProcess 中:

You need to add the FEED_FORMAT and FEED_URI to your CrawlerProcess:

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'json',
'FEED_URI': 'data.json'
})

这篇关于从带有文件输出的脚本运行 Scrapy的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从带有文件输出的脚本运行 Scrapy [英] Running Scrapy from a script with file output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从带有文件输出的脚本运行 Scrapy [英] Running Scrapy from a script with file output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭