Scrapy 请求+响应+下载时间 [英] Scrapy request+response+download time

查看：49 发布时间：2021/7/16 22:01:06 scrapy

本文介绍了Scrapy 请求+响应+下载时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

UPD:不是很接近的问题，因为我认为我的方法不是很清楚

UPD: Not close question because I think my way is not so clear as should be

是否可以获取当前请求 + 响应 + 下载时间以将其保存到 Item?

Is it possible to get current request + response + download time for saving it to Item?

在普通"python中我做

In "plain" python I do

start_time = time()
urllib2.urlopen('http://example.com').read()
time() - start_time

但是我如何用 Scrapy 做到这一点?

But how i can do this with Scrapy?

UPD:

解决方案对我来说已经足够了，但我不确定结果的质量.如果你有很多连接超时错误下载时间可能是错误的(甚至DOWNLOAD_TIMEOUT * 3)

Solution enought for me but I'm not sure of quality of results. If you have many connections with timeout errors Download time may be wrong (even DOWNLOAD_TIMEOUT * 3)

为了

settings.py

DOWNLOADER_MIDDLEWARES = {
    'myscraper.middlewares.DownloadTimer': 0,
}

中间件.py

from time import time
from scrapy.http import Response


class DownloadTimer(object):
    def process_request(self, request, spider):
        request.meta['__start_time'] = time()
        # this not block middlewares which are has greater number then this
        return None

    def process_response(self, request, response, spider):
        request.meta['__end_time'] = time()
        return response  # return response coz we should

    def process_exception(self, request, exception, spider):
        request.meta['__end_time'] = time()
        return Response(
            url=request.url,
            status=110,
            request=request)

在def parse(...

log.msg('Download time: %.2f - %.2f = %.2f' % (
    response.meta['__end_time'], response.meta['__start_time'],
    response.meta['__end_time'] - response.meta['__start_time']
), level=log.DEBUG)

Scrapy 请求+响应+下载时间 [英] Scrapy request+response+download time

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scrapy 请求+响应+下载时间 [英] Scrapy request+response+download time

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭