我能做些什么来提高Python 3中的套接字性能? [英] What can I do to improve socket performance in Python 3?

查看:311
本文介绍了我能做些什么来提高Python 3中的套接字性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很长的运行程序,其中大约97%的性能与ftp创建的套接字对象绑定.retrlines和ftp.retrbinary调用。我已经使用进程和线程来并行化程序。还有其他什么方法可以提高速度吗?



示例代码:

 #获取文件列表
ftpfilelist = []
ftp.retrlines('NLST%s'%ftp_directory,ftpfilelist.append)
...过滤文件列表,部分几乎没有时间...
#将open(path,'wb')作为fout下载文件

ftp.retrbinary('RETR%s'%ftp_path,fout。写)

cProfiler的输出:

<$在$ 548.883秒内的函数调用(5888775原始调用)

按顺序排列:内部时间
由于限制< 50>,由843减少到50列表;

ncalls tottime percall cumtime percall filename:lineno(函数)
9166 249.154 0.027 249.154 0.027 {'_socket.socket'对象}的方法'recv_into'
99573 230.489 0.002 230.489 0.002 {'_socket.socket'对象的方法'recv'}
1767 53.113 0.030 53.129 0.030 {'_socket.socket'对象的方法'connect'
98808 2.839 0.000 2.839 0.000 '_io.BufferedWriter'objects}






后续行动

gevent分叉结果( https://github.com/fantix/gevent )支持python 3.4.1:

  7645675在301.813秒内调用函数(7153156原语调用)

按顺序排列:内部时间
列表由于限制而从948减少到50< 50>

ncalls tottime percall cumtime percall文件名:lineno(功能)
107541/4418 281.228 0.003 296.499 0.067 gevent / hub.py:354(等待)
99885/59883 4.466 0.000 405.922 0.007 gevent / _socket3.py:248(recv)
99097 2.244 0.000 2.244 0.000 {'_io.BufferedWriter'对象的方法'write'
111125/2796 1.036 0.000 0.017 0.000 gevent / hub.py: 345(转换)
107543/2788 1.000 0.000 0.039 0.000 gevent / hub.py:575(得到)

concurrent.futures.ThreadPool的结果

  5319963函数调用(5318875原始调用)在359.541秒

排序:内部时间
由于限制,由872减少到50列表< 50>

ncalls tottime percall cumtime percall文件名:lineno(函数)
31 349.876 11.286 349.876 11.286 {'_thread.lock'对象的方法'acquire'
2652 3.293 0.001 3.293 0.001 {'_socket.socket'对象的方法'recv'}
310270 0.790 0.000 0.790 0.000 {'datetime.date'对象的method'timetuple}
25 0.661 0.026 0.661 0.026 {method'recv_into'of '_socket.socket'objects}

结论:
For我的使用案例,gevent将性能提高了大约20%!

解决方案

查看 gevent 。它可以修补你正在使用的任何库(比如你的FTP库),以通过使用合作线程来提高套接字性能。



一般的前提是线程化程序不是'因为调度程序不知道线程是否正在等待网络操作,所以当前线程可能被调度,但也浪费时间等待I / O,而其他线程实际上可能是使用gevent,只要您的线程(称为greenlet)遇到阻塞网络呼叫,它就会自动切换到另一个greenlet。通过这种机制,您的线程/ greenlet将充分发挥其潜力。



以下是该库的一个很好的介绍:
http://www.gevent.org/intro.html#example


Initial Post

I have a very long running program where about 97% of the performance is tied up in socket objects created by ftp.retrlines and ftp.retrbinary calls. I have already used processes and threads to parallelize the program. Is there anything else I can do to eek out some more speed?

Example code:

# Get file list
ftpfilelist = []
ftp.retrlines('NLST %s' % ftp_directory, ftpfilelist.append)
... filter file list, this part takes almost no time ...
# Download a file
with open(path, 'wb') as fout:
    ftp.retrbinary('RETR %s' % ftp_path, fout.write)

Output from the cProfiler:

5890792 function calls (5888775 primitive calls) in 548.883 seconds

Ordered by: internal time
List reduced from 843 to 50 due to restriction <50>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  9166  249.154    0.027  249.154    0.027 {method 'recv_into' of '_socket.socket' objects}
 99573  230.489    0.002  230.489    0.002 {method 'recv' of '_socket.socket' objects}
  1767   53.113    0.030   53.129    0.030 {method 'connect' of '_socket.socket' objects}
 98808    2.839    0.000    2.839    0.000 {method 'write' of '_io.BufferedWriter' objects}


Follow Up

Results for a gevent fork (https://github.com/fantix/gevent) supporting python 3.4.1:

7645675 function calls (7153156 primitive calls) in 301.813 seconds

Ordered by: internal time
List reduced from 948 to 50 due to restriction <50>

ncalls       tottime  percall  cumtime  percall filename:lineno(function)
107541/4418  281.228    0.003  296.499    0.067 gevent/hub.py:354(wait)
99885/59883    4.466    0.000  405.922    0.007 gevent/_socket3.py:248(recv)
99097          2.244    0.000    2.244    0.000 {method 'write' of '_io.BufferedWriter' objects}
111125/2796    1.036    0.000    0.017    0.000 gevent/hub.py:345(switch)
107543/2788    1.000    0.000    0.039    0.000 gevent/hub.py:575(get)

Results for concurrent.futures.ThreadPool:

5319963 function calls (5318875 primitive calls) in 359.541 seconds

Ordered by: internal time
List reduced from 872 to 50 due to restriction <50>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    31  349.876   11.286  349.876   11.286 {method 'acquire' of '_thread.lock' objects}
  2652    3.293    0.001    3.293    0.001 {method 'recv' of '_socket.socket' objects}
310270    0.790    0.000    0.790    0.000 {method 'timetuple' of 'datetime.date' objects}
    25    0.661    0.026    0.661    0.026 {method 'recv_into' of '_socket.socket' objects}

Conclusion: For my use case, gevent improved performance by about 20%!

解决方案

Take a look into gevent. It can monkey patch any libraries you are using (such as your FTP lib), to improve socket performance by using cooperative threads.

The general premise is that threaded programs aren't very efficient with heavy I/O programs because the scheduler doesn't know if the thread is waiting on a network operation, and so the current thread may be scheduled but also wasting time waiting on I/O, while other threads could actually be doing work.

With gevent, as soon as your thread (called a greenlet) hits a blocking network call, it automatically switches to another greenlet. Through this mechanism, your threads/greenlets are used to their fullest potential.

Here's a great introduction to this library: http://www.gevent.org/intro.html#example

这篇关于我能做些什么来提高Python 3中的套接字性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆