Scrapy检查图像响应是否为404 [英] Scrapy check if image response is 404

查看:411
本文介绍了Scrapy检查图像响应是否为404的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想处理图片网址,我启用并配置为 Scrapy Docs ;但是如果图像URL返回404或被重定向会发生什么。我想记录它,保存失败的URL和HTTP错误/重定向代码。我在哪里可以放置代码呢?

I want to process image URLs, I enabled and configured as Scrapy Docs; but what happens if the image URL returns 404 or is redirected. I want to log that, save the failed URLs and the HTTP error/redirect code. Where can I put the code to do that?

推荐答案

在pipleline中处理它是完全错误的,因为响应会把所有的中间件扔回你的蜘蛛然后扔到你的pipleline,而你的目的只是记录失败。

It is completely wrong to handle that in the pipleline, because the response would go throw all the middlewares back to your spider then to your pipleline, while your purpose is just logging the failure.

你应该构建自己的中间件来处理任何HTTP响应代码。

You should build your own middleware to handle any HTTP response code.

默认情况下,scrapy允许使用200到300之间的雕像代码进行回复。您可以通过列出您希望收到的雕像代码进行编辑:

By default, scrapy allows responses with statues codes between 200 and 300. You can edit that by listing the statue codes that you would like to receive like this:

class Yourspider(spider):
    handle_httpstatus_list = [404, 302] #add any other code you need

然后你应该构建你的中间件并将它添加到你的配置中:

Then you should build your middleware and add it to your configuration like this:

DOWNLOADER_MIDDLEWARES = {
    'myProject.myMiddlewares.CustomSpiderMiddleware': SELECT_NUMBER_SUITS_FOR_YOU,
}

在你身上r CustomSpiderMiddleware 检查这样的状态:

in your CustomSpiderMiddleware check the status like this:

process_spider_input(response, spider):
    if response.status == 404
        #do what ever you want

这篇关于Scrapy检查图像响应是否为404的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆