避免重定向 [英] Avoiding redirection
问题描述
我正在尝试解析一个站点(用 ASP 编写),并且爬虫被重定向到主站点.但我想做的是解析给定的 url,而不是重定向的 url.有没有办法做到这一点?.我尝试将REDIRECT=False"添加到 settings.py 文件中,但没有成功.
I'm trying to parse a site(written in ASP) and the crawler gets redirected to the main site. But what I'd like to do is to parse the given url, not the redirected one. Is there a way to do this?. I tried adding "REDIRECT=False" to the settings.py file without success.
这是爬虫的一些输出:
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=500&id=500>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1513&id=1513>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=476&id=476>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=472&id=472>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=457&id=457>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097>
推荐答案
http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097
重定向到 http://www.cotodigital.com.ar/default.asp
因为 HTTP 响应是这样说的.发生这种情况是因为 asp 代码正在检查某些条件 - 错误的页面、cookie、用户代理或引用.检查上述条件.
http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097
redirects to http://www.cotodigital.com.ar/default.asp
because HTTP response said to so. This happens because asp code is checking for some condition - a wrong page, or cookies, or user-agent, or referrer. Check the mentioned conditions.
更新:刚刚在我的浏览器中检查:浏览器也被重定向到主页,在那里我点击跳过广告".之后就正常了.
UPDATE: Just checked in my browser: the browser is also redirected to the main page, where i click 'Skip ads'. After that it works OK.
这意味着它设置了一些 cookie,如果没有这些 cookie,它会重定向到主页.
This means it sets some cookies, without which it redirects to the main page.
这篇关于避免重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!