避免重定向 [英] Avoiding redirection

查看:38
本文介绍了避免重定向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一个站点(用 ASP 编写),并且爬虫被重定向到主站点.但我想做的是解析给定的 url,而不是重定向的 url.有没有办法做到这一点?.我尝试将REDIRECT=False"添加到 settings.py 文件中,但没有成功.

I'm trying to parse a site(written in ASP) and the crawler gets redirected to the main site. But what I'd like to do is to parse the given url, not the redirected one. Is there a way to do this?. I tried adding "REDIRECT=False" to the settings.py file without success.

这是爬虫的一些输出:

2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=500&id=500>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1513&id=1513>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=476&id=476>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=472&id=472>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=457&id=457>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097>

推荐答案

http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097 重定向到 http://www.cotodigital.com.ar/default.asp 因为 HTTP 响应是这样说的.发生这种情况是因为 asp 代码正在检查某些条件 - 错误的页面、cookie、用户代理或引用.检查上述条件.

http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097 redirects to http://www.cotodigital.com.ar/default.asp because HTTP response said to so. This happens because asp code is checking for some condition - a wrong page, or cookies, or user-agent, or referrer. Check the mentioned conditions.

更新:刚刚在我的浏览器中检查:浏览器也被重定向到主页,在那里我点击跳过广告".之后就正常了.

UPDATE: Just checked in my browser: the browser is also redirected to the main page, where i click 'Skip ads'. After that it works OK.

这意味着它设置了一些 cookie,如果没有这些 cookie,它会重定向到主页.

This means it sets some cookies, without which it redirects to the main page.

另见 Scrapy - 如何管理 cookie/会话

这篇关于避免重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆