Python-为2D蒙版数组并行化python循环? [英] Python - parallelize a python loop for 2D masked array?

查看:90
本文介绍了Python-为2D蒙版数组并行化python循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能是一个平常的问题,但是如何在Python中并行化此循环?

Probably a commonplace question, but how can I parallelize this loop in Python?

for i in range(0,Nx.shape[2]):
  for j in range(0,Nx.shape[2]):
    NI=Nx[:,:,i]; NJ=Nx[:,:,j]
    Ku[i,j] = (NI[mask!=True]*NJ[mask!=True]).sum()

所以我的问题是:并行化此代码的最简单方法是什么?

So my question: what's the easiest way to parallelize this code?

         ---------- EDIT LATER------------------

数据示例

import random
import numpy as np
import numpy.ma as ma
from numpy import unravel_index    

#my input
Nx = np.random.rand(5,5,5)  

#mask creation
mask_positions = zip(*np.where((Nx[:,:,0] < 0.4)))
mask_array_positions = np.asarray(mask_positions)
i, j = mask_array_positions.T
mask = np.zeros(Nx[:,:,0].shape, bool)
mask[i,j] = True

我想通过并行计算Ku.我的目的是使用Ku数组解决线性问题,因此我必须将遮罩值分开(代表数组的一半)

And i want to calculate Ku by parallelizing. My aim is to use the Ku array to solve a linear problem so i have to put the mask values apart (represent near the half of my array)

推荐答案

我认为您想使用numpy术语进行向量化",而不是以多进程方式并行化.

I think you want to 'vectorize', to use numpy terminology, not parallelize in the multiprocess way.

您的计算实质上是一个点(矩阵)乘积.将mask应用于整个数组一次,以获得2d数组NIJ.其形状将为(N,5),其中N~maskTrue值的数量.然后,它只是一个(5,N)数组,该数组用(N,5)点缀"了-即.在N维上求和,剩下一个(5,5)数组.

Your calculation is essentially a dot (matrix) product. Apply the mask once to the whole array to get a 2d array, NIJ. Its shape will be (N,5), where N is the number of True values in ~mask. Then it's just a (5,N) array 'dotted' with a (N,5) - ie. sum over the N dimension, leaving you with a (5,5) array.

NIJ = Nx[~mask,:]
Ku = np.dot(NIJ.T,NIJ)

在快速测试中,它与双循环产生的Ku相匹配.根据用于np.dot的基础库,可能会进行一些多核计算,但这通常不是numpy用户的优先事项.

In quick tests it matches the Ku produced by your double loop. Depending on the underlying library used for np.dot there might be some multicore calculation, but that's usually not a priority issue for numpy users.

应用大的布尔值mask是这些计算中最耗时的部分-矢量化版本和迭代版本.

Applying the large boolean mask is the most time consuming part of these calculations - both the vectorized and iterative versions.

对于具有400,000个True值的mask,请比较以下两个索引时间:

For a mask with 400,000 True values, compare these 2 indexing times:

In [195]: timeit (NI[:400,:1000],NJ[:400,:1000])
100000 loops, best of 3: 4.87 us per loop
In [196]: timeit (NI[mask],NJ[mask])
10 loops, best of 3: 98.8 ms per loop

通过基本(切片)索引选择相同数量的项目比使用mask进行高级索引要快几个数量级.

Selecting the same number of items with basic (slice) indexing is several orders of magnitude faster than advanced indexing with the mask.

np.dot(NI[mask],NJ[mask])替换为(NI[mask]*NJ[mask]).sum()仅节省了几毫秒.

Substituting np.dot(NI[mask],NJ[mask]) for (NI[mask]*NJ[mask]).sum() only saves a few ms.

这篇关于Python-为2D蒙版数组并行化python循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆