Python 3 urllib Vs 请求性能 [英] Python 3 urllib Vs requests performance

查看:75
本文介绍了Python 3 urllib Vs 请求性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 python 3.5,我正在检查 urllib 模块与请求模块的性能.我在 python 中编写了两个客户端,第一个使用 urllib 模块,第二个使用请求模块.它们都生成一个二进制数据,我将其发送到基于烧瓶的服务器,并从烧瓶服务器返回二进制数据给客户端.我发现将数据从客户端发送到服务器所花费的时间对于两个模块(urllib,请求)花费的时间相同,但是与请求相比,urllib 将数据从服务器返回到客户端所花费的时间要快两倍以上.我正在本地主机上工作.
我的问题是为什么?
我在请求模块上做错了什么使它变慢了?

I'm using python 3.5 and I'm checking the performance of urllib module Vs requests module. I wrote two clients in python the first one is using the urllib module and the second one is using the request module. they both generate a binary data, which I send to a server which is based on flask and from the flask server I also return a binary data to the client. I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request. I'm working on localhost.
my question is why?
what I'm doing wrong with request module which make it to be slower?

这是服务器代码:

from flask import Flask, request
app = Flask(__name__)
from timeit import default_timer as timer
import os

@app.route('/onStringSend', methods=['GET', 'POST'])
def onStringSend():
    return data

if __name__ == '__main__':
    data_size = int(1e7)
    data = os.urandom(data_size)    
    app.run(host="0.0.0.0", port=8080)

这是基于urllib的客户端代码:

import urllib.request as urllib2
import urllib.parse
from timeit import default_timer as timer
import os

data_size = int(1e7)
num_of_runs = 20
url = 'http://127.0.0.1:8080/onStringSend'

def send_binary_data():
    data = os.urandom(data_size)
    headers = {'User-Agent': 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;  Windows NT)', 'Content-Length': '%d' % len(data), 'Content-Type':  'application/octet-stream'}
    req = urllib2.Request(url, data, headers)
    round_trip_time_msec = [0] * num_of_runs
    for i in range(0,num_of_runs):
        t1 = timer()
        resp = urllib.request.urlopen(req)
        response_data = resp.read()
        t2 = timer()
        round_trip_time_msec[i] = (t2 - t1) * 1000

    t_max = max(round_trip_time_msec)
    t_min = min(round_trip_time_msec)
    t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)

    print('max round trip time [msec]: ', t_max)
    print('min round trip time [msec]: ', t_min)
    print('average round trip time [msec]: ', t_average)


send_binary_data()

这是基于请求的客户端代码:

import requests
import os
from timeit import default_timer as timer


url = 'http://127.0.0.1:8080/onStringSend'
data_size = int(1e7)
num_of_runs = 20


def send_binary_data():
    data = os.urandom(data_size)
    s = requests.Session()
    s.headers['User-Agent'] = 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;Windows NT)'
    s.headers['Content-Type'] = 'application/octet-stream'
    s.headers['Content-Length'] = '%d' % len(data)

    round_trip_time_msec = [0] * num_of_runs
    for i in range(0,num_of_runs):
        t1 = timer()
        response_data = s.post(url=url, data=data, stream=False, verify=False)
        t2 = timer()
        round_trip_time_msec[i] = (t2 - t1) * 1000

    t_max = max(round_trip_time_msec)
    t_min = min(round_trip_time_msec)
    t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)

    print('max round trip time [msec]: ', t_max)
    print('min round trip time [msec]: ', t_min)
    print('average round trip time [msec]: ', t_average)

send_binary_data()

非常感谢

推荐答案

首先,为了重现该问题,我不得不在您的 onStringSend 函数中添加以下行:

First of all, to reproduce the problem, I had to add the following line to your onStringSend function:

request.get_data()

否则,我会收到对等方重置连接"错误,因为服务器的接收缓冲区一直在填满.

Otherwise, I was getting "connection reset by peer" errors because the server’s receive buffer kept filling up.

现在,这个问题的直接原因是 Response.content(当 stream=False 时隐式调用)以 10240 字节为单位迭代响应数据:

Now, the immediate reason for this problem is that Response.content (which is called implicitly when stream=False) iterates over the response data in chunks of 10240 bytes:

self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()

因此,解决问题的最简单方法是使用 stream=True,从而告诉 Requests 您将按照自己的节奏读取数据:

Therefore, the easiest way to solve the problem is to use stream=True, thus telling Requests that you will be reading the data at your own pace:

response_data = s.post(url=url, data=data, stream=True, verify=False).raw.read()

通过这个改动,Requests 版本的性能与 urllib 版本的性能或多或少相同.

With this change, the performance of the Requests version becomes more or less the same as that of the urllib version.

另请参阅原始响应内容"请求文档中的部分以获得有用的建议.

Please also see the "Raw Response Content" section in the Requests docs for useful advice.

现在,有趣的问题仍然存在:为什么 Response.content 以如此小的块进行迭代?在与科里·本菲尔德交谈之后,Requests 的核心开发者,看起来可能没有什么特别的原因.我在请求中提交了issue #3186 以进一步研究此问题.

Now, the interesting question remains: why is Response.content iterating in such small chunks? After talking to Cory Benfield, a core developer of Requests, it looks like there may be no particular reason. I filed issue #3186 in Requests to look further into this.

这篇关于Python 3 urllib Vs 请求性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆