在scrapy中运行多个蜘蛛 [英] Running Multiple spiders in scrapy

查看:60
本文介绍了在scrapy中运行多个蜘蛛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. 例如在scrapy 中,如果我有两个包含不同HTML 的URL.现在我想为一个编写两个单独的蜘蛛,并希望同时运行两个蜘蛛.在scrapy中可以同时运行多个蜘蛛.

  1. In scrapy for example if i had two URL's that contains different HTML. Now i want to write two individual spiders each for one and want to run both the spiders at once. In scrapy is it possible to run multiple spiders at once.

在写了多个蜘蛛后,我们如何安排它们每6小时运行一次(可能就像cron作业)

In scrapy after writing multiple spiders, how can we schedule them to run for every 6 hours(May be like cron jobs)

我不知道上面的内容,你能告诉我如何用一个例子来执行上面的事情.

I had no idea of above , can u suggest me how to perform the above things with an example.

提前致谢.

推荐答案

从操作系统级别一次运行两个scrapy 脚本可能是最简单的.他们都应该能够保存到同一个数据库.创建一个shell脚本来同时调用两个scrapy脚本:

It would probably be easiest to just run two scrapy scripts at once from the OS level. They should both be able to save to the same database. Create a shell script to call both scrapy scripts to do them at the same time:

scrapy runspider foo &
scrapy runspider bar

确保使用 chmod +x script_name

要每 6 小时安排一次 cronjob,请在终端中输入 crontab -e,然后按如下方式编辑文件:

To schedule a cronjob every 6 hours, type crontab -e into your terminal, and edit the file as follows:

* */6 * * * path/to/shell/script_name >> path/to/file.log

第一个 * 是分钟,然后是小时等,星号是通配符.所以这表示在小时数可被 6 整除的任何时间运行脚本,或每六小时运行一次.

The first * is minutes, then hours, etc., and an asterik is a wildcard. So this says run the script at any time where the hours is divisible by 6, or every six hours.

这篇关于在scrapy中运行多个蜘蛛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆