为什么Python 3 http.client比python-requests快得多? [英] Why is Python 3 http.client so much faster than python-requests?

查看:1217
本文介绍了为什么Python 3 http.client比python-requests快得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我今天正在测试不同的Python HTTP库,我意识到 http.client 库的执行速度比 请求

I was testing different Python HTTP libraries today and I realized that http.client library seems to perform much much faster than requests.

要测试它,您可以运行以下两个代码示例。

To test it you can run following two code samples.

import http.client

conn = http.client.HTTPConnection("localhost", port=8000)
for i in range(1000):
    conn.request("GET", "/")
    r1 = conn.getresponse()
    body = r1.read()
    print(r1.status)

conn.close()

这里的代码与python-requests做同样的事情:

and here is code doing same thing with python-requests:

import requests

with requests.Session() as session:
    for i in range(1000):
        r = session.get("http://localhost:8000")
        print(r.status_code)

如果我启动SimpleHTTPServer:

If I start SimpleHTTPServer:

> python -m http.server

并运行代码示例(我正在使用Python 3.5.2) 。我得到以下结果:

and run above code samples (I'm using Python 3.5.2). I get following results:

http.client:

http.client:

0.35user 0.10system 0:00.71elapsed 64%CPU 

python-requests:

python-requests:

1.76user 0.10system 0:02.17elapsed 85%CPU 

我的测量和测试是否正确?你也可以复制它们吗?如果有,是否有人知道 http.client 内的内容会让它变得如此之快?为什么处理时间有这么大的差异?

Are my measurements and tests correct? Can you reproduce them too? If yes does anyone know what's going on inside http.client that make it so much faster? Why is there such big difference in processing time?

推荐答案

根据两者的分析,主要区别似乎是 requests 版本正在为每个请求执行DNS查找,而 http.client 版本正在执行此操作。

Based on profiling both, the main difference appears to be that the requests version is doing a DNS lookup for every request, while the http.client version is doing so once.

# http.client
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1974    0.541    0.000    0.541    0.000 {method 'recv_into' of '_socket.socket' objects}
     1000    0.020    0.000    0.045    0.000 feedparser.py:470(_parse_headers)
    13000    0.015    0.000    0.563    0.000 {method 'readline' of '_io.BufferedReader' objects}
...

# requests
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1481    0.827    0.001    0.827    0.001 {method 'recv_into' of '_socket.socket' objects}
     1000    0.377    0.000    0.382    0.000 {built-in method _socket.gethostbyname}
     1000    0.123    0.000    0.123    0.000 {built-in method _scproxy._get_proxy_settings}
     1000    0.111    0.000    0.111    0.000 {built-in method _scproxy._get_proxies}
    92000    0.068    0.000    0.284    0.000 _collections_abc.py:675(__iter__)
...

您提供的是主机名到 http.client.HTTPConnection()一次,所以它会调用 gethostbyname 一次。 requests.Session 可能会缓存主机名查找,但显然没有。

You're providing the hostname to http.client.HTTPConnection() once, so it makes sense it would call gethostbyname once. requests.Session probably could cache hostname lookups, but it apparently does not.

编辑:经过一些进一步的研究,这不仅仅是一个简单的缓存问题。有一个函数可以确定是否绕过最终调用 gethostbyname 的代理,而不管实际的请求本身。

After some further research, it's not just a simple matter of caching. There's a function for determining whether to bypass proxies which ends up invoking gethostbyname regardless of the actual request itself.

这篇关于为什么Python 3 http.client比python-requests快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆