从脚本scrapy运行蜘蛛 [英] scrapy run spider from script
本文介绍了从脚本scrapy运行蜘蛛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从脚本运行我的蜘蛛而不是 scrap crawl
I want to run my spider from a script rather than a scrap crawl
我找到了这个页面
http://doc.scrapy.org/en/latest/topics/实践.html
但实际上它并没有说明将该脚本放在哪里.
but actually it doesn't say where to put that script.
有什么帮助吗?
推荐答案
简单明了 :)
只需查看官方文档.我会做一些改变,这样你就可以控制蜘蛛只在你执行 python myscript.py
时运行,而不是每次从它导入时.只需添加一个 if __name__ == "__main__"
:
Just check the official documentation. I would make there a little change so you could control the spider to run only when you do python myscript.py
and not every time you just import from it. Just add an if __name__ == "__main__"
:
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
pass
if __name__ == "__main__":
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
现在将文件另存为 myscript.py
并运行python myscript.py".
Now save the file as myscript.py
and run 'python myscript.py`.
享受吧!
这篇关于从脚本scrapy运行蜘蛛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文