是否可以从 Scrapy 蜘蛛访问反应堆? [英] Is it possible to access the reactor from a Scrapy spider?

查看:44
本文介绍了是否可以从 Scrapy 蜘蛛访问反应堆?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究在 Scrapy 蜘蛛内部实现爬行延迟的方法.我想知道是否可以访问反应堆的 callLater 来自蜘蛛内部的方法?这将使页面能够在 n 秒后很容易地被解析.

I'm looking at ways of implementing a crawl delays inside of Scrapy spiders. I was wondering if it is possible to do access the reactor's callLater method from within a spider? That would enable a page to be parsed after n seconds quite easily.

推荐答案

您实际上可以通过在设置文件中设置 DOWNLOAD_DELAY 来轻松设置延迟.

You can set a delay with ease actually by setting the DOWNLOAD_DELAY in the settings file.

DOWNLOAD_DELAY

DOWNLOAD_DELAY

默认值:0

下载器应该等待的时间(以秒为单位)从同一个蜘蛛下载连续的页面.这个可以用限制爬行速度以避免对服务器造成太大影响.支持十进制数.示例:

The amount of time (in secs) that the downloader should wait before downloading consecutive pages from the same spider. This can be used to throttle the crawling speed to avoid hitting servers too hard. Decimal numbers are supported. Example:

DOWNLOAD_DELAY = 0.25 # 250 ms 的延迟 这个设置也是受 RANDOMIZE_DOWNLOAD_DELAY 设置(由默认).默认情况下,Scrapy 不会等待固定的时间请求之间,但使用 0.5 和 1.5 之间的随机间隔 *DOWNLOAD_DELAY.

DOWNLOAD_DELAY = 0.25 # 250 ms of delay This setting is also affected by the RANDOMIZE_DOWNLOAD_DELAY setting (which is enabled by default). By default, Scrapy doesn’t wait a fixed amount of time between requests, but uses a random interval between 0.5 and 1.5 * DOWNLOAD_DELAY.

您也可以为每个蜘蛛更改此设置.

You can also change this setting per spider.

另见 Scrapy 的文档 - DOWNLOAD_DELAY

这篇关于是否可以从 Scrapy 蜘蛛访问反应堆?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆