我在 python 多处理/多线程中缺少什么? [英] What am I missing in python-multiprocessing/multithreading?

查看:56
本文介绍了我在 python 多处理/多线程中缺少什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 numpy 中创建、相乘然后求和两个大矩阵的所有元素.我使用两种方法,一个循环,并在 multiprocessing 模块的帮助下执行了数百次(请参阅下面的代码段).

I am creating, multiplying and then summing all elements of two big matrices in numpy. I do this some hundred times with two methods, a loop and with the help of the multiprocessing modul (see the snipet below).

def worker_loop(n):
  for i in n:
    mul = np.sum(np.random.normal(size=[i,i])*np.random.normal(size=[i,i]))

def worker(i):
  mul = np.sum(np.random.normal(size=[i,i])*np.random.normal(size=[i,i]))

n = range(100,300)

pool = ThreadPool(2)
pool.map(worker, n)
pool.close()
pool.join()

worker_loop(n)

测量时间表明 loopmultiprocessing 快.我也尝试过 threading 模块但没有成功(然后我读到这是一个坏主意;阅读更多 这里)

Measuring the time tells that the loop is faster than multiprocessing. I have also tried the threading module with no success (then I read that this was a bad idea; read more here)

我开始尝试多线程,因为我需要将图像、标签、边界框……转换为 tfrecords.为此,我正在研究 tensorflow/inception 中的一个文件(如果你想要停留 build_imagenet_data.py,第 453 行).我相信多线程在这里有效,这就是他们使用它的原因.

I started this experimenting with multithreading because I need to convert images, labels, bounding boxes, ... into tfrecords. For that I am studying a file from tensorflow/inception (if you want do dwell build_imagenet_data.py, line 453). I believe that here multithreading works that's why they use it.

说到这里,我的问题可以这么说,

Saying this, my question can be put as follows,

  • 我的代码中缺少什么;是否有可能通过小的修改实现一些东西?
  • 因为 tensorflow 是用 C++ 和 CUDA 编写的,所以初始示例是否有效?
  • 何时建议将多处理或多线程与 numpy、tensorflow 等结合使用?

推荐答案

总有一些开销(同步、数据准备、数据复制等).

There is always some overhead (synchronization, data-preparation, data-copies and co.).

但是:如果设置良好,numpy 中的矩阵向​​量和向量向量操作已经是多线程的,使用 BLAS(这是最先进的标准,到处都在使用,包括 numpy、matlab 和可能的 tensorflow 的 cpu 后端;虽然有不同的实现).

But: given a good setup, your matrix-vector and vector-vector operations in numpy are already multithreaded, using BLAS (which is the state of the art standard used everywhere including numpy, matlab and probably tensorflow's cpu-backend; there are different implementations though).

因此,如果 BLAS 能够占用您的所有核心(大尺寸更容易),您只会看到开销.

So if BLAS is able to occupy all your cores (easier with big dimensions), you are only seeing the overhead.

是的,其核心中的 tensorflow 将至少由 C/C++/Fortran 中的一种实现,外加 BLAS 用于它的 CPU 后端和一些 Cuda-libs 以 GPU 为目标.这也意味着,作为梯度计算和优化计算的核心算法永远不需要外部并行化(在所有用例的 99.9% 中).

And yes, tensorflow in it's core will be implemented by at least one of C/C++/Fortran plus BLAS for it's CPU-backend and some Cuda-libs when targeting GPU. This also means, that the core-algorithms as gradient-calcs and optimization-calcs should never need external parallelization (in 99.9% of all use-cases).

这篇关于我在 python 多处理/多线程中缺少什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆