通过脚本&运行Scrapy Spider配置输出文件的设置 [英] Run Scrapy spider via script & configure the settings for output file
问题描述
我用沙哑的笔迹写了一只蜘蛛.在python脚本中运行它(不是草率的cmd提示符). 我想配置设置,以便在特定文件(例如output.json)中获取废弃的数据.
I have written a spider in scrapy & running it in a python script (not scrapy cmd prompt). I want to configure settings, so as to get the Scrapped data in a particular file (say output.json).
如果我在提示符下运行以下命令,则可以得到结果:"scrapy crawl myspider -o scrapedData.json -t json"
I can get the result if I run following command on the prompt:"scrapy crawl myspider -o scrapedData.json -t json"
但是我希望通过不通过cmdline工具运行脚本来获得相同的输出.
But I want the same output by running a script not via cmdline tool.
感谢您的帮助!
推荐答案
settings = get_project_settings()
settings.overrides['FEED_URI'] = 'dealsOutput.json'
settings.overrides['FEED_FORMAT'] = 'json'
spider = dealsSpider()
crawler = Crawler(settings)
我通过查看以下代码发现: https: //github.com/scrapy/scrapy/blob/master/scrapy/commands/crawl.py#L34
I found by looking at this code: https://github.com/scrapy/scrapy/blob/master/scrapy/commands/crawl.py#L34
这篇关于通过脚本&运行Scrapy Spider配置输出文件的设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!