通过脚本&运行Scrapy Spider配置输出文件的设置 [英] Run Scrapy spider via script & configure the settings for output file

查看:73
本文介绍了通过脚本&运行Scrapy Spider配置输出文件的设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用沙哑的笔迹写了一只蜘蛛.在python脚本中运行它(不是草率的cmd提示符). 我想配置设置,以便在特定文件(例如output.json)中获取废弃的数据.

I have written a spider in scrapy & running it in a python script (not scrapy cmd prompt). I want to configure settings, so as to get the Scrapped data in a particular file (say output.json).

如果我在提示符下运行以下命令,则可以得到结果:"scrapy crawl myspider -o scrapedData.json -t json"

I can get the result if I run following command on the prompt:"scrapy crawl myspider -o scrapedData.json -t json"

但是我希望通过不通过cmdline工具运行脚本来获得相同的输出.

But I want the same output by running a script not via cmdline tool.

感谢您的帮助!

推荐答案

settings = get_project_settings()
settings.overrides['FEED_URI'] = 'dealsOutput.json'
settings.overrides['FEED_FORMAT'] = 'json'

spider = dealsSpider()
crawler = Crawler(settings)

我通过查看以下代码发现: https: //github.com/scrapy/scrapy/blob/master/scrapy/commands/crawl.py#L34

I found by looking at this code: https://github.com/scrapy/scrapy/blob/master/scrapy/commands/crawl.py#L34

这篇关于通过脚本&运行Scrapy Spider配置输出文件的设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆