为什么Python 3 http.client比python-requests快得多? [英] Why is Python 3 http.client so much faster than python-requests?
问题描述
我今天正在测试不同的Python HTTP库,我意识到 http.client
库的执行速度比 请求
。
I was testing different Python HTTP libraries today and I realized that http.client
library seems to perform much much faster than requests
.
要测试它,您可以运行以下两个代码示例。
To test it you can run following two code samples.
import http.client
conn = http.client.HTTPConnection("localhost", port=8000)
for i in range(1000):
conn.request("GET", "/")
r1 = conn.getresponse()
body = r1.read()
print(r1.status)
conn.close()
这里的代码与python-requests做同样的事情:
and here is code doing same thing with python-requests:
import requests
with requests.Session() as session:
for i in range(1000):
r = session.get("http://localhost:8000")
print(r.status_code)
如果我启动SimpleHTTPServer:
If I start SimpleHTTPServer:
> python -m http.server
并运行代码示例(我正在使用Python 3.5.2) 。我得到以下结果:
and run above code samples (I'm using Python 3.5.2). I get following results:
http.client:
http.client:
0.35user 0.10system 0:00.71elapsed 64%CPU
python-requests:
python-requests:
1.76user 0.10system 0:02.17elapsed 85%CPU
我的测量和测试是否正确?你也可以复制它们吗?如果有,是否有人知道 http.client
内的内容会让它变得如此之快?为什么处理时间有这么大的差异?
Are my measurements and tests correct? Can you reproduce them too? If yes does anyone know what's going on inside http.client
that make it so much faster? Why is there such big difference in processing time?
推荐答案
根据两者的分析,主要区别似乎是 requests
版本正在为每个请求执行DNS查找,而 http.client
版本正在执行此操作。
Based on profiling both, the main difference appears to be that the requests
version is doing a DNS lookup for every request, while the http.client
version is doing so once.
# http.client
ncalls tottime percall cumtime percall filename:lineno(function)
1974 0.541 0.000 0.541 0.000 {method 'recv_into' of '_socket.socket' objects}
1000 0.020 0.000 0.045 0.000 feedparser.py:470(_parse_headers)
13000 0.015 0.000 0.563 0.000 {method 'readline' of '_io.BufferedReader' objects}
...
# requests
ncalls tottime percall cumtime percall filename:lineno(function)
1481 0.827 0.001 0.827 0.001 {method 'recv_into' of '_socket.socket' objects}
1000 0.377 0.000 0.382 0.000 {built-in method _socket.gethostbyname}
1000 0.123 0.000 0.123 0.000 {built-in method _scproxy._get_proxy_settings}
1000 0.111 0.000 0.111 0.000 {built-in method _scproxy._get_proxies}
92000 0.068 0.000 0.284 0.000 _collections_abc.py:675(__iter__)
...
您提供的是主机名到 http.client.HTTPConnection()
一次,所以它会调用 gethostbyname
一次。 requests.Session
可能会缓存主机名查找,但显然没有。
You're providing the hostname to http.client.HTTPConnection()
once, so it makes sense it would call gethostbyname
once. requests.Session
probably could cache hostname lookups, but it apparently does not.
编辑:经过一些进一步的研究,这不仅仅是一个简单的缓存问题。有一个函数可以确定是否绕过最终调用 gethostbyname
的代理,而不管实际的请求本身。
After some further research, it's not just a simple matter of caching. There's a function for determining whether to bypass proxies which ends up invoking gethostbyname
regardless of the actual request itself.
这篇关于为什么Python 3 http.client比python-requests快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!