Scrapy 每 5 分钟将统计信息发送到作为 POST 请求作为参数传递的 URL [英] Scrapy send stats to a URL passed as argument as a POST request every 5 minutes

查看:20
本文介绍了Scrapy 每 5 分钟将统计信息发送到作为 POST 请求作为参数传递的 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将爬虫统计信息发送到作为蜘蛛参数传递的 URL.我需要每隔 5 分钟发出一次 POST 请求.我该怎么做?

I need to send the crawler stats to a URL which is passed on as a spider argument. I need to make a POST request at regular intervals of 5 minutes. How can I do that?

推荐答案

您可能希望编写一个扩展程序,每 5 分钟简单地发出一次发布请求.
您可以使用scrapy 自己的机制(例如engine.download())发出这些请求,也可以使用不同的异步http 客户端(例如treq)

You will probably want to write an extension that simply makes a post request every 5 minutes.
You can make these requests either using scrapy's own mechanisms (e.g. engine.download()), or you can use a different async http client (e.g. treq)

如果你不确定如何构建你的扩展,你可以看看 logstats.py 做类似的事情,除了不通过 h​​ttp 登录.

If you're not sure how to structure your extension, you can take a look at logstats.py which does a similar thing, except not logging over http.

既然您正在编写扩展程序,我建议您进行 url 和间隔设置,但该选择取决于您.

Since you're writing an extension anyway, I'd recommend making the url and interval settings, but that choice is up to you.

这篇关于Scrapy 每 5 分钟将统计信息发送到作为 POST 请求作为参数传递的 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆