如何在scrapy中处理302重定向 [英] how to handle 302 redirect in scrapy

查看:181
本文介绍了如何在scrapy中处理302重定向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在抓取网站时收到来自服务器的 302 响应:

I am receiving a 302 response from a server while scrapping a website:

2014-04-01 21:31:51+0200 [ahrefs-h] DEBUG: Redirecting (302) to <GET http://www.domain.com/Site_Abuse/DeadEnd.htm> from <GET http://domain.com/wps/showmodel.asp?Type=15&make=damc&a=664&b=51&c=0>

我想将请求发送到 GET url 而不是被重定向.现在我找到了这个中间件:

I want to send request to GET urls instead of being redirected. Now I found this middleware:

https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/downloadermiddleware/redirect.py#L31

我将此重定向代码添加到我的 middleware.py 文件中,并将其添加到 settings.py 中:

I added this redirect code to my middleware.py file and I added this into settings.py:

DOWNLOADER_MIDDLEWARES = {
 'street.middlewares.RandomUserAgentMiddleware': 400,
 'street.middlewares.RedirectMiddleware': 100,
 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
}

但我仍然被重定向.为了让这个中间件正常工作,我只需要这样做吗?我错过了什么吗?

But I am still getting redirected. Is that all I have to do in order to get this middleware working? Do I miss something?

推荐答案

忘记了这个场景中的中间件,这会解决问题:

Forgot about middlewares in this scenario, this will do the trick:

meta = {'dont_redirect': True,'handle_httpstatus_list': [302]}

也就是说,您在产生请求时需要包含元参数:

That said, you will need to include meta parameter when you yield your request:

yield Request(item['link'],meta = {
                  'dont_redirect': True,
                  'handle_httpstatus_list': [302]
              }, callback=self.your_callback)

这篇关于如何在scrapy中处理302重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆