Python request.get 无法获得我可以在浏览器上打开的 url 的答案 [英] Python request.get fails to get an answer for a url I can open on my browser

查看:37
本文介绍了Python request.get 无法获得我可以在浏览器上打开的 url 的答案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习如何使用 python 请求 (Python 3),我正在尝试制作一个简单的 requests.get 以从多个网站获取 HTML 代码.尽管它适用于大多数人,但我遇到了一个问题.

I'm learning how to use python requests (Python 3) and I am trying to make a simple requests.get to get the HTML code from several websites. Although it works for most of them, there is one I am having trouble with.

当我打电话时:http://es.rs-online.com/ 一切正常:

When I call : http://es.rs-online.com/ everything works fine:

In [1]: import requests
   ...:html = requests.get("http://es.rs-online.com/")
In [2]:html
Out[2]: <Response [200]>

但是,当我使用 http://es.farnell.com/ 尝试时,python 无法解决地址并永远继续努力.如果我设置了超时,无论多长时间,requests.get() 将始终被超时中断,而不会被其他任何东西中断.我也尝试添加标题,但没有解决问题.此外,我认为该错误与我使用的代理无关,因为我可以在浏览器中打开该网站.目前,我的代码如下所示:

However, when I try it with http://es.farnell.com/, python is unable to solve the address and keeps working on it forever. If I set a timeout, no matter how long, the requests.get() will always be interrupted by the timeout and by nothing else. I have also tried adding headers but it didn't solve the issue. Also, I don't think the error has anything to do with the proxy that I'm using, as I am able to open this website in my browser. Currently, my code looks like this:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
html = requests.get("http://es.farnell.com/",headers=headers, timeout=5, allow_redirects = True )

5 秒后,我收到预期的超时通知.

After 5 secs, I get the expected timeout notification.

ReadTimeout: HTTPConnectionPool(host='es.farnell.com', port=80): Read timed out. (read timeout=5)

有人知道可能是什么问题吗?

Does anyone know what could be the issue?

推荐答案

问题出在您的标题中.请记住,当涉及到您发送的标题内容时,某些站点比其他站点更宽松.为了解决这个问题,您应该将当前的标题替换为:

The problem is in your header. Do remember that some site are more lenient than others when it comes to the content of the header you are sending. In order to fix the issue, you should replace your current header with:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}

我还建议您将获取请求发送到 https://es.farnell.com/ 而不是 http://es.farnell.com/,删除 timeout = 5 并删除 allow_redirects = True (因为它默认为 True).

I would also recommend you to send the get request to https://es.farnell.com/ rather than http://es.farnell.com/, remove the timeout = 5 and remove allow_redirects = True (as it is True by default).

总的来说,您的代码应如下所示:

import requests


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get("https://es.farnell.com",headers=headers)

希望这会有所帮助.

这篇关于Python request.get 无法获得我可以在浏览器上打开的 url 的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆