Scrapy 请求+响应+下载时间 [英] Scrapy request+response+download time

查看:49
本文介绍了Scrapy 请求+响应+下载时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

UPD:不是很接近的问题,因为我认为我的方法不是很清楚

UPD: Not close question because I think my way is not so clear as should be

是否可以获取当前请求 + 响应 + 下载时间以将其保存到 Item?

Is it possible to get current request + response + download time for saving it to Item?

在普通"python中我做

In "plain" python I do

start_time = time()
urllib2.urlopen('http://example.com').read()
time() - start_time

但是我如何用 Scrapy 做到这一点?

But how i can do this with Scrapy?

UPD:

解决方案对我来说已经足够了,但我不确定结果的质量.如果你有很多连接超时错误下载时间可能是错误的(甚至DOWNLOAD_TIMEOUT * 3)

Solution enought for me but I'm not sure of quality of results. If you have many connections with timeout errors Download time may be wrong (even DOWNLOAD_TIMEOUT * 3)

为了

settings.py

settings.py

DOWNLOADER_MIDDLEWARES = {
    'myscraper.middlewares.DownloadTimer': 0,
}

中间件.py

from time import time
from scrapy.http import Response


class DownloadTimer(object):
    def process_request(self, request, spider):
        request.meta['__start_time'] = time()
        # this not block middlewares which are has greater number then this
        return None

    def process_response(self, request, response, spider):
        request.meta['__end_time'] = time()
        return response  # return response coz we should

    def process_exception(self, request, exception, spider):
        request.meta['__end_time'] = time()
        return Response(
            url=request.url,
            status=110,
            request=request)

def parse(...

log.msg('Download time: %.2f - %.2f = %.2f' % (
    response.meta['__end_time'], response.meta['__start_time'],
    response.meta['__end_time'] - response.meta['__start_time']
), level=log.DEBUG)

推荐答案

你可以写一个 Downloader Middleware 会为每个请求计时.它会在请求发出之前为请求添加一个开始时间,然后在请求完成时添加一个完成时间.通常,诸如此类的任意数据存储在 Request.meta 属性.此时间信息稍后可以由您的蜘蛛读取并添加到您的项目中.

You could write a Downloader Middleware which would time each request. It would add a start time to the request before it's made and then a finish time when it's finished. Typically, arbitrary data such as this is stored in the Request.meta attribute. This timing information could later be read by your spider and added to your item.

这个下载器中间件听起来对很多项目都有用.

This downloader middleware sounds like it could be useful on many projects.

这篇关于Scrapy 请求+响应+下载时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆