requests.get 返回 403 而相同的 url 在浏览器中工作 [英] requests.get returns 403 while the same url works in browser

查看:135
本文介绍了requests.get 返回 403 而相同的 url 在浏览器中工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 rlsnet.ru 上的搜索表单.这是我从源文件中提取的表单定义:

<input id="simplesearch_text_input" class="search__field" type="text" name="word" value="" autocomplete="off"><input type="hidden" name="path" value="/" id="path"><input type="hidden" name="enter_clicked" value="1"><input id="letters_id" type="hidden" name="letters" value=""><input type="submit" class="g-btn search__btn" value="Найти" id="simplesearch_button"><div class="sf_suggestion"><ul style="display: none; z-index:1000; opacity:0.85;">

<div id="contentsf">

</表单>

这是我用来发送搜索请求的代码:

导入请求从 urllib.parse 导入 urlencoderoot = "http://www.rlsnet.ru/search_result.htm?"response = requests.get(root + urlencode({"word": "Церебролизин".encode('cp1251')})

每次执行时,响应状态为 403.当我输入相同的请求 URL(即 http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED) 进入 Safari/Chrome/Opera,它工作正常并返回预期的页面.我究竟做错了什么?谷歌搜索这个问题只会带来这个问题:为什么url 可以在浏览器中使用,但不使用请求 get 方法,这没什么用.

解决方案

那是因为 requests 的默认 User-Agentpython-requests/2.13.0,并且在您的情况下,该网站不喜欢来自非浏览器"的流量,因此他们试图阻止此类流量.

<预><代码>>>>进口请求>>>session = requests.Session()>>>会话头文件{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}

你需要做的就是让请求看起来像是来自浏览器,所以只需添加一个额外的header参数:

导入请求headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # 这是 chrome,你可以设置任何浏览器喜欢response = requests.get('http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED', headers=标题)打印 response.status_code打印 response.url200http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED

I'm trying to use the search form at rlsnet.ru. Here is the form's definition I've extracted from the source file:

<form id="site_search_form" action="/search_result.htm" method="get">
    <input id="simplesearch_text_input" class="search__field" type="text" name="word" value="" autocomplete="off">
    <input type="hidden" name="path" value="/" id="path">
    <input type="hidden" name="enter_clicked" value="1">
    <input id="letters_id" type="hidden" name="letters" value="">
    <input type="submit" class="g-btn search__btn" value="Найти" id="simplesearch_button">
    <div class="sf_suggestion">
        <ul style="display: none; z-index:1000; opacity:0.85;">
        </ul>
    </div>
    <div id="contentsf">
    </div>
</form>

Here is the code I used to send the search request:

import requests
from urllib.parse import urlencode 

root = "http://www.rlsnet.ru/search_result.htm?"
response = requests.get(root + urlencode({"word": "Церебролизин".encode('cp1251')})

Each time I do it, the response status is 403. When I enter the same request URL (i.e. http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED) into Safari/Chrome/Opera, it works fine and returns the expected page. What am I doing wrong? Googling the issue only brought this SO question: why url works in browser but not using requests get method, which was of little use.

解决方案

Well that's because default User-Agent of requests is python-requests/2.13.0, and in your case that website don't like traffic from "non-browsers", so they try to block such traffic.

>>> import requests
>>> session = requests.Session()
>>> session.headers
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}

All you need to do is to make the request appear like coming from a browser, so just add an extra header parameter:

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
response = requests.get('http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED', headers=headers)

print response.status_code
print response.url

200 
http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED

这篇关于requests.get 返回 403 而相同的 url 在浏览器中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
Python最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆