Scrapy process.crawl()将数据导出到json [英] Scrapy process.crawl() to export data to json

查看:782
本文介绍了Scrapy process.crawl()将数据导出到json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是的子问题令人毛骨悚然的python ,但是作者将答案(没有回答我要问自己的子问题)标记为令人满意.

This might be a subquestion of Passing arguments to process.crawl in Scrapy python but the author marked the answer (that doesn't answer the subquestion i'm asking myself) as a satisfying one.

这是我的问题:我无法使用scrapy crawl mySpider -a start_urls(myUrl) -o myData.json
相反,我希望/需要使用crawlerProcess.crawl(spider),我已经想出了几种方法来传递参数(无论如何,我所链接的问题已得到回答),但是我无法理解应该如何告诉它转储数据放入myData.json ... -o myData.json部分
有人提出建议吗?还是我只是不了解它应该如何工作..?

Here's my problem : I cannot use scrapy crawl mySpider -a start_urls(myUrl) -o myData.json
Instead i want/need to use crawlerProcess.crawl(spider) I have already figured out several way to pass the arguments (and anyway it is answered in the question I linked) but i can't grasp how i am supposed to tell it to dump the data into myData.json... the -o myData.json part
Anyone got a suggestion ? Or am I just not understanding how it is supposed to work..?

这是代码:

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

spider = challenges(start_urls=["http://www.myUrl.html"])
crawlerProcess.crawl(spider)
#For now i am just trying to get that bit of code to work but obviously it will become a loop later.

dispatcher.connect(handleSpiderIdle, signals.spider_idle)

log.start()
print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."

推荐答案

您需要在设置上指定它:

You need to specify it on the settings:

process = CrawlerProcess({
    'FEED_URI': 'file:///tmp/export.json',
})

process.crawl(MySpider)
process.start()

这篇关于Scrapy process.crawl()将数据导出到json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆