从简单的获取返回 403 Forbidden 但在浏览器中加载正常 [英] Returning 403 Forbidden from simple get but loads okay in browser

查看:42
本文介绍了从简单的获取返回 403 Forbidden 但在浏览器中加载正常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从页面获取一些数据,但它返回错误 [403 Forbidden].

I'm trying to get some data from a page, but it's returning the error [403 Forbidden].

我以为是用户代理,但我尝试了几个用户代理,它仍然返回错误.

I thought it was the user agent, but I tried several user agents and it still returns the error.

我也尝试使用库 fake user-agent 但我没有成功.

I also tried to use the library fake user-agent but I did not succeed.

with requests.Session() as c:
        url = '...'
        #headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2224.3 Safari/537.36'}
        ua = UserAgent()
        header = {'User-Agent':str(ua.chrome)}
        page = c.get(url, headers=header)
        print page.content

当我手动访问页面时,一切正常.

When I access the page manually, everything works.

我正在使用 python 2.7.14 和 请求库,任何想法?

I'm using python 2.7.14 and requests library, Any idea?

推荐答案

该站点可能使用请求中的任何内容来触发拒绝.

The site could be using anything in the request to trigger the rejection.

因此,从浏览器发出的请求中复制所有标头.然后将它们一一删除1以找出哪些是必不可少的.

So, copy all headers from the request that your browser makes. Then delete them one by one1 to find out which are essential.

根据 Python 请求.403 Forbidden,给请求添加自定义headers,执行:

As per Python requests. 403 Forbidden, to add custom headers to the request, do:

result = requests.get(url, headers={'header':'value', <etc>})

<小时>

1更快的方法是删除一半每次而是更复杂,因为可能有多个基本标题


1A faster way would be to delete half of them each time instead but that's more complicated since there are probably multiple essential headers

这篇关于从简单的获取返回 403 Forbidden 但在浏览器中加载正常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆