如何在scrapy中处理302重定向 [英] how to handle 302 redirect in scrapy
问题描述
我在抓取网站时收到来自服务器的 302 响应:
I am receiving a 302 response from a server while scrapping a website:
2014-04-01 21:31:51+0200 [ahrefs-h] DEBUG: Redirecting (302) to <GET http://www.domain.com/Site_Abuse/DeadEnd.htm> from <GET http://domain.com/wps/showmodel.asp?Type=15&make=damc&a=664&b=51&c=0>
我想将请求发送到 GET url 而不是被重定向.现在我找到了这个中间件:
I want to send request to GET urls instead of being redirected. Now I found this middleware:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/downloadermiddleware/redirect.py#L31
我将此重定向代码添加到我的 middleware.py 文件中,并将其添加到 settings.py 中:
I added this redirect code to my middleware.py file and I added this into settings.py:
DOWNLOADER_MIDDLEWARES = {
'street.middlewares.RandomUserAgentMiddleware': 400,
'street.middlewares.RedirectMiddleware': 100,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
}
但我仍然被重定向.为了让这个中间件正常工作,我只需要这样做吗?我错过了什么吗?
But I am still getting redirected. Is that all I have to do in order to get this middleware working? Do I miss something?
推荐答案
忘记了这个场景中的中间件,这会解决问题:
Forgot about middlewares in this scenario, this will do the trick:
meta = {'dont_redirect': True,'handle_httpstatus_list': [302]}
也就是说,您在产生请求时需要包含元参数:
That said, you will need to include meta parameter when you yield your request:
yield Request(item['link'],meta = {
'dont_redirect': True,
'handle_httpstatus_list': [302]
}, callback=self.your_callback)
这篇关于如何在scrapy中处理302重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!