Scrapy 请求+响应+下载时间 [英] Scrapy request+response+download time
问题描述
UPD:不是很接近的问题,因为我认为我的方法不是很清楚
UPD: Not close question because I think my way is not so clear as should be
是否可以获取当前请求 + 响应 + 下载时间以将其保存到 Item?
Is it possible to get current request + response + download time for saving it to Item?
在普通"python中我做
In "plain" python I do
start_time = time()
urllib2.urlopen('http://example.com').read()
time() - start_time
但是我如何用 Scrapy 做到这一点?
But how i can do this with Scrapy?
UPD:
解决方案对我来说已经足够了,但我不确定结果的质量.如果你有很多连接超时错误下载时间
可能是错误的(甚至DOWNLOAD_TIMEOUT * 3)
Solution enought for me but I'm not sure of quality of results. If you have many connections with timeout errors Download time
may be wrong (even DOWNLOAD_TIMEOUT * 3)
为了
settings.py
settings.py
DOWNLOADER_MIDDLEWARES = {
'myscraper.middlewares.DownloadTimer': 0,
}
中间件.py
from time import time
from scrapy.http import Response
class DownloadTimer(object):
def process_request(self, request, spider):
request.meta['__start_time'] = time()
# this not block middlewares which are has greater number then this
return None
def process_response(self, request, response, spider):
request.meta['__end_time'] = time()
return response # return response coz we should
def process_exception(self, request, exception, spider):
request.meta['__end_time'] = time()
return Response(
url=request.url,
status=110,
request=request)
在def parse(...
log.msg('Download time: %.2f - %.2f = %.2f' % (
response.meta['__end_time'], response.meta['__start_time'],
response.meta['__end_time'] - response.meta['__start_time']
), level=log.DEBUG)
推荐答案
你可以写一个 Downloader Middleware 会为每个请求计时.它会在请求发出之前为请求添加一个开始时间,然后在请求完成时添加一个完成时间.通常,诸如此类的任意数据存储在 Request.meta 属性.此时间信息稍后可以由您的蜘蛛读取并添加到您的项目中.
You could write a Downloader Middleware which would time each request. It would add a start time to the request before it's made and then a finish time when it's finished. Typically, arbitrary data such as this is stored in the Request.meta attribute. This timing information could later be read by your spider and added to your item.
这个下载器中间件听起来对很多项目都有用.
This downloader middleware sounds like it could be useful on many projects.
这篇关于Scrapy 请求+响应+下载时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!