为什么numpy.mean不是多线程的? [英] why isn't numpy.mean multithreaded?

查看:439
本文介绍了为什么numpy.mean不是多线程的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找一些方法来轻松地对我的一些简单分析代码进行多线程处理,因为我注意到numpy仅使用一个内核,尽管事实上它应该是多线程的.

I've been looking for ways to easily multithread some of my simple analysis code since I had noticed numpy it was only using one core, despite the fact that it is supposed to be multithreaded.

我知道numpy是为多个内核配置的,因为我可以看到使用numpy.dot的测试使用了我的所有内核,因此我只是将Mean重新实现为点积,并且运行速度更快.是否有某些原因意味着不能自己快速运行?我发现较大的数组具有类似的行为,尽管该比率比示例中显示的3接近2.

I know that numpy is configured for multiple cores, since I can see tests using numpy.dot use all my cores, so I just reimplemented mean as a dot product, and it runs way faster. Is there some reason mean can't run this fast on its own? I find similar behavior for larger arrays, although the ratio is close to 2 than the 3 shown in my example.

我一直在阅读有关类似的numpy速度问题的大量文章,而且显然它的方式比我想象的要复杂.任何见解都会有所帮助,我更愿意仅使用均值,因为它更易读且代码更少,但我可能会改用基于点的均值.

I've been reading a bunch of posts on similar numpy speed issues, and apparently its way more complicated than I would have thought. Any insight would be helpful, I'd prefer to just use mean since it's more readable and less code, but I might switch to dot based means.

In [27]: data = numpy.random.rand(10,10)

In [28]: a = numpy.ones(10)

In [29]: %timeit numpy.dot(data,a)/10.0
100000 loops, best of 3: 4.8 us per loop

In [30]: %timeit numpy.mean(data,axis=1)
100000 loops, best of 3: 14.8 us per loop

In [31]: numpy.dot(data,a)/10.0 - numpy.mean(data,axis=1)
Out[31]: 
array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   1.11022302e-16,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
        -1.11022302e-16])

推荐答案

我一直在寻找一些方法来轻松地对我的一些简单分析代码进行多线程处理,因为我注意到numpy仅使用一个内核,尽管事实上它应该是多线程的.

I've been looking for ways to easily multithread some of my simple analysis code since I had noticed numpy it was only using one core, despite the fact that it is supposed to be multithreaded.

谁说它应该是多线程的?

Who says it's supposed to be multithreaded?

numpy的主要设计目的是在单核上尽可能快,并且在需要时尽可能并行化.但是您仍然必须并行化它.

numpy is primarily designed to be as fast as possible on a single core, and to be as parallelizable as possible if you need to do so. But you still have to parallelize it.

尤其是,您可以同时对独立的子对象进行操作,并且慢速操作在可能的情况下会释放GIL,尽管在可能的情况下"可能还远远不够.另外,numpy对象被设计为尽可能容易地在进程之间共享或传递,以方便使用multiprocessing.

In particular, you can operate on independent sub-objects at the same time, and slow operations release the GIL when possible—although "when possible" may not be nearly enough. Also, numpy objects are designed to be shared or passed between processes as easily as possible, to facilitate using multiprocessing.

有一些专门的方法可以自动并行化,但是大多数核心方法却没有.特别地,dot在可能的情况下在BLAS之上实现,并且BLAS在大多数平台上自动并行化,而mean在纯C代码中实现.

There are some specialized methods that are automatically parallelized, but most of the core methods are not. In particular, dot is implemented on top of BLAS when possible, and BLAS is automatically parallelized on most platforms, but mean is implemented in plain C code.

请参见使用numpy和scipy并行编程详细信息.

那么,您如何知道哪些方法是并行的,哪些不是?而且,在那些不是的情况下,您如何知道哪些可以很好地进行手动线程化,哪些需要进行多处理呢?

So, how do you know which methods are parallelized and which aren't? And, of those which aren't, how do you know which ones can be nicely manually-threaded and which need multiprocessing?

对此没有好的答案.您可以做出有根据的猜测(X似乎可能是在ATLAS之上实现的,而我的ATLAS副本是隐式线程化的),或者您可以阅读源代码.

There's no good answer to that. You can make educated guesses (X seems like it's probably implemented on top of ATLAS, and my copy of ATLAS is implicitly threaded), or you can read the source.

但是通常,最好的办法是尝试并测试.如果代码使用一个内核的100%,另一个内核的0%,请添加手动线程.如果现在使用的是一个内核的100%,另一个内核的10%,并且运行速度几乎不快,请将多线程更改为多处理. (幸运的是,Python使这变得非常容易,特别是如果您使用concurrent.futures中的Executor类或multiprocessing中的Pool类.但是您仍然经常需要考虑一下,并测试共享与如果数组很大,则通过.)

But usually, the best thing to do is try it and test. If the code is using 100% of one core and 0% of the others, add manual threading. If it's now using 100% of one core and 10% of the others and barely running faster, change the multithreading to multiprocessing. (Fortunately, Python makes this pretty easy, especially if you use the Executor classes from concurrent.futures or the Pool classes from multiprocessing. But you still often need to put some thought into it, and test the relative costs of sharing vs. passing if you have large arrays.)

而且,正如kwatford指出的那样,仅仅因为某种方法似乎不是隐式并行的,并不意味着它不会在numpy的下一版本,BLAS的下一版本或其他版本中并行.平台,甚至是安装了稍有不同的东西的机器上.因此,准备重新测试.并执行类似my_mean = numpy.mean的操作,然后在任何地方使用my_mean,因此您只需将一行更改为my_mean = pool_threaded_mean.

Also, as kwatford points out, just because some method doesn't seem to be implicitly parallel doesn't mean it won't be parallel in the next version of numpy, or the next version of BLAS, or on a different platform, or even on a machine with slightly different stuff installed on it. So, be prepared to re-test. And do something like my_mean = numpy.mean and then use my_mean everywhere, so you can just change one line to my_mean = pool_threaded_mean.

这篇关于为什么numpy.mean不是多线程的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆