Python请求很慢,需要很长时间才能完成HTTP或HTTPS请求 [英] Python requests is slow and takes very long to complete HTTP or HTTPS request

查看:523
本文介绍了Python请求很慢,需要很长时间才能完成HTTP或HTTPS请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 requests 库请求网络资源或网站或网络服务时,请求需要很长时间才能完成.代码类似于以下内容:

When requesting a web resource or website or web service with the requests library, the request takes a long time to complete. The code looks similar to the following:

import requests
requests.get("https://www.example.com/")

此请求需要 2 多分钟(正好 2 分 10 秒)才能完成!为什么它这么慢,我该如何解决?

This request takes over 2 minutes (exactly 2 minutes 10 seconds) to complete! Why is it so slow and how can I fix it?

推荐答案

这个问题可以有多种可能的解决方案.StackOverflow 上有很多关于这些的答案,所以我会尝试将它们全部结合起来,以节省您搜索它们的麻烦.

There can be multiple possible solutions to this problem. There are a multitude of answers on StackOverflow for any of these, so I will try to combine them all to save you the hassle of searching for them.

在我的搜索中,我发现了以下几层:

In my search I have uncovered the following layers to this:

对于许多问题,激活日志记录可以帮助您发现问题所在(来源):

For many problems, activating logging can help you uncover what goes wrong (source):

import requests
import logging

import http.client
http.client.HTTPConnection.debuglevel = 1

# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

requests.get("https://www.example.com")

如果调试输出不能帮助您解决问题,请继续阅读.

In case the debug output does not help you solve the problem, read on.

不请求所有数据,而只发送 HEAD 请求会更快(来源)::>

It can be faster to not request all data, but to only send a HEAD request (source):

requests.head("https://www.example.com")

有些服务器不支持这个,那么你可以尝试流式响应():

Some servers don't support this, then you can try to stream the response (source):

requests.get("https://www.example.com", stream=True)

对于连续的多个请求,尝试使用会话

如果您连续发送多个请求,您可以使用 requests.Session 来加速请求.这可以确保与服务器的连接保持打开和配置,并且还保留 cookie 作为一个很好的好处.试试这个(来源):

For multiple requests in a row, try utilizing a Session

If you send multiple requests in a row, you can speed up the requests by utilizing a requests.Session. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):

import requests
session = requests.Session()
for _ in range(10):
    session.get("https://www.example.com")

要并行化您的请求(尝试 > 10 个请求),请使用 requests-futures

如果您一次发送大量请求,每个请求都会阻塞执行.您可以使用例如 requests-futures(来自 kederrac):

from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession

with FuturesSession() as session:
    futures = [session.get("https://www.example.com") for _ in range(10)]
    for future in as_completed(futures):
        response = future.result()

注意不要让服务器同时收到太多请求.

Be careful not to overwhelm the server with too many requests at the same time.

如果这也不能解决您的问题,请继续阅读...

If this also does not solve your problem, read on...

在许多情况下,原因可能在于您请求的服务器.首先,通过以相同方式请求任何其他 URL 来验证这一点:

In many cases, the reason might lie with the server you are requesting from. First, verify this by requesting any other URL in the same fashion:

requests.get("https://www.google.com")

如果这一切正常,您可以将精力集中在以下可能的问题上:

If this works fine, you can focus your efforts on the following possible problems:

服务器可能会专门阻止请求,或者他们可能会利用白名单或其他一些原因.要发送更好的用户代理字符串,请尝试此操作(来源):

The server might specifically block requests, or they might utilize a whitelist, or some other reason. To send a nicer user-agent string, try this (source):

headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}
requests.get("https://www.example.com", headers=headers)

服务器限制了你

如果这个问题只是偶尔出现,例如几次请求后,服务器可能会限制您的速率.检查响应以查看它是否读取了这些内容(即达到速率限制"、超出工作队列深度"或类似内容;来源).

在这里,解决方案只是在请求之间等待更长的时间,例如使用 time.sleep().

Here, the solution is just to wait longer between requests, for example by using time.sleep().

您可以通过不读取从服务器收到的响应来检查这一点.如果代码仍然很慢,这不是您的问题,但如果修复了它,问题可能在于解析响应.

You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.

  1. 如果某些标头设置不正确,这可能会导致阻止分块传输的解析错误(来源).
  2. 在其他情况下,手动设置编码可能会解决解析问题(来源).

要解决这些问题,请尝试:

To fix those, try:

r = requests.get("https://www.example.com")
r.raw.chunked = True # Fix issue 1
r.encoding = 'utf-8' # Fix issue 2
print(response.text)

IPv6 不起作用,但 IPv4 起作用

这可能是最糟糕的问题.一种简单但很奇怪的检查方法是添加一个 timeout 参数,如下所示:

requests.get("https://www.example.com/", timeout=5)

如果返回一个成功响应,则问题应该出在 IPv6 上.原因是 requests 首先尝试 IPv6 连接.当超时时,它会尝试通过 IPv4 进行连接.通过将超时设置低,您可以强制它在更短的时间内切换到 IPv4.

If this returns a successful response, the problem should lie with IPv6. The reason is that requests first tries an IPv6 connection. When that times out, it tries to connect via IPv4. By setting the timeout low, you force it to switch to IPv4 within a shorter amount of time.

使用例如 wgetcurl 进行验证:

Verify by utilizing, e.g., wget or curl:

wget --inet6-only https://www.example.com -O - > /dev/null
# or
curl --ipv6 -v https://www.example.com

在这两种情况下,我们都会强制工具通过 IPv6 进行连接以隔离问题.如果超时,请再次尝试强制使用 IPv4:

In both cases, we force the tool to connect via IPv6 to isolate the issue. If this times out, try again forcing IPv4:

wget --inet4-only https://www.example.com -O - > /dev/null
# or
curl --ipv4 -v https://www.example.com

如果这一切正常,您就找到了问题!但是你问怎么解决?

If this works fine, you have found your problem! But how to solve it, you ask?

  1. 暴力解决方案是完全禁用 IPv6.
  2. 您也可以仅为当前会话禁用 IPv6.
  3. 您可能只想强制请求使用 IPv4.(在链接的答案中,您必须修改代码以始终为 IPv4 返回 socket.AF_INET.)
  4. 如果您想为 SSH 解决此问题,请按以下方法强制使用 IPv4 以使用 SSH.(简而言之,将 AddressFamily inet 添加到您的 SSH 配置中.)
  5. 您可能还想检查问题是否出在您的 DNS 或 TCP.
  1. A brute-force solution is to disable IPv6 completely.
  2. You may also disable IPv6 for the current session only.
  3. You may just want to force requests to use IPv4. (In the linked answer, you have to adapt the code to always return socket.AF_INET for IPv4.)
  4. If you want to fix this problem for SSH, here is how to force IPv4 for SSH. (In short, add AddressFamily inet to your SSH config.)
  5. You may also want to check if the problem lies with your DNS or TCP.

这篇关于Python请求很慢,需要很长时间才能完成HTTP或HTTPS请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆