Python中的并行性并不适用 [英] Parallelism in python isn't working right

查看:161
本文介绍了Python中的并行性并不适用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python 2.7开发了一个应用程序,ajax调用从API请求一些数据,一个请求可能需要〜200 ms,但是当我打开两个浏览器并在非常接近的时间发出两个请求时,比这更重要的是,我试过把所有内容都放在线程中,但它不工作..(这种情况发生在应用程序在线时,而不仅仅是在开发服务器上)



所以我写了这个简单的测试,看看这是否是python中的一个问题(在忙碌的等待情况下),这里是代码和结果:

  def work():
t = datetime.now()
print threading.currentThread(),t
i = 0
而我< 100000000:
i + = 1
t2 = datetime.now()
打印threading.currentThread(),t2,t2-t

if __name__ =='__main__ ':
printsingle threaded:
t1 = threading.Thread(target = work)
t1.start()
t1.join()

printmulti threaded:
t1 = threading.Thread(target = work)
t1.start()
t2 = threading.Thread(target = work)
t2 .start()
t1.join()
t2.join()

在mac os x,core i7(4核,8线程),python2.7上的结果:

 单线程:
<线程(线程1,开始4315942912)> 2011-12-06 15:38:07.763146

多线程:
<线程(线程2,开始4315942912)> 2011-12-06 15:38:13.091952
< Thread(Thread-3,started 4323282944)> 2011-12-06 15:38:13.102250
< Thread(Thread-3,started 4323282944)> 2011-12-06 15:38:29.221050 0:00:16.118800

这很震撼!如果一个线程需要5秒钟才能做到这一点..我认为同时启动两个线程将花费相同的时间来完成这两个任务,但它需要几乎三倍的时间..这使得整个线程的想法无用,因为顺序执行它们会更快!



我在这里错过了什么。

解决方案

模块提供了一种方便的方法,特别是因为多处理API是非常类似于线程API。

 导入多处理器为mp 
导入日期时间为dt
def work( ):
t = dt.datetime.now()
print mp.current_process()。name,t
i = 0
while i< 100000000:
i + = 1
t2 = dt.datetime.now()
print mp.current_process()。name,t2,t2-t

if __name__ =='__main__':
printsingle process:
t1 = mp.Process(target = work)
t1.start()
t1.join()

printmulti process:
t1 = mp.Process(target = work)
t1.start()
t2 = mp.Process(target = work)
t2.start()
t1.join()
t2.join()

产生

 单进程:
Process-1 2011-12-06 12:34:20.611526
Process-1 2011-12-06 12:34:28.494831 0:00:07.883305
多进程:
Process-3 2011-12-06 12:34:28.497895
Process-2 2011-12-06 12:34:28.503433
Process-2 2011-12-06 12:34:36.458354 0:00:07.954921
Process-3 2011-12-06 12:34 :36.546656 0:00:08.048761

PS。正如zeekay在评论中指出的那样,GIL战斗对于CPU限制任务来说只是非常严重的。它不应该是一个IO绑定任务的问题。

I was developing an app on gae using python 2.7, an ajax call requests some data from an API, a single request could take ~200 ms, however when I open two browsers and make two requests at a very close time they take more than the double of that, I've tried putting everything in threads but it didn't work.. (this happens when the app is online, not just on the dev-server)

So I wrote this simple test to see if this is a problem in python in general (in case of a busy wait), here is the code and the result:

def work():
    t = datetime.now()
    print threading.currentThread(), t
    i = 0
    while i < 100000000:
        i+=1
    t2 = datetime.now()
    print threading.currentThread(), t2, t2-t

if __name__ == '__main__': 
    print "single threaded:"
    t1 = threading.Thread(target=work)
    t1.start()
    t1.join()

    print "multi threaded:"
    t1 = threading.Thread(target=work)
    t1.start()
    t2 = threading.Thread(target=work)
    t2.start()
    t1.join()
    t2.join()

The result on mac os x, core i7 (4 cores, 8 threads), python2.7:

single threaded:
<Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:07.763146
<Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:13.091614 0:00:05.328468

multi threaded:
<Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:13.091952
<Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:13.102250
<Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:29.221050 0:00:16.118800
<Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:29.237512 0:00:16.145560

This is pretty shocking!! if a single thread would take 5 seconds to do this.. I thought starting two threads at the same time will take the same time to finish both tasks, but it takes almost triple the time.. this makes the whole threading idea useless, as it would be faster to do them sequentially!

what am I missing here..

解决方案

David Beazley gave a talk about this issue at PyCon 2010. As others have already stated, for some tasks, using threading especially with multiple cores can lead to slower performance than the same task performed by a single thread. The problem, Beazley found, had to do with multiple cores having a "GIL battle":

To avoid GIL contention, you may get better results having the tasks run in separate processes instead of separate threads. The multiprocessing module provides a convenient way to do that especially since multiprocessing API is very similar to the threading API.

import multiprocessing as mp
import datetime as dt
def work():
    t = dt.datetime.now()
    print mp.current_process().name, t
    i = 0
    while i < 100000000:
        i+=1
    t2 = dt.datetime.now()
    print mp.current_process().name, t2, t2-t

if __name__ == '__main__': 
    print "single process:"
    t1 = mp.Process(target=work)
    t1.start()
    t1.join()

    print "multi process:"
    t1 = mp.Process(target=work)
    t1.start()
    t2 = mp.Process(target=work)
    t2.start()
    t1.join()
    t2.join()

yields

single process:
Process-1 2011-12-06 12:34:20.611526
Process-1 2011-12-06 12:34:28.494831 0:00:07.883305
multi process:
Process-3 2011-12-06 12:34:28.497895
Process-2 2011-12-06 12:34:28.503433
Process-2 2011-12-06 12:34:36.458354 0:00:07.954921
Process-3 2011-12-06 12:34:36.546656 0:00:08.048761

PS. As zeekay pointed out in the comments, The GIL battle is only severe for CPU-bound tasks. It should not be a problem for IO-bound tasks.

这篇关于Python中的并行性并不适用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆