Numba - 如何并行填充二维数组 [英] Numba - How to fill 2D array in parallel

查看:111
本文介绍了Numba - 如何并行填充二维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个对 float64(x,y) 上的二维矩阵进行运算的函数.基本概念:对于每个行的组合(行数选择 2),计算减法后正值的数量(行 1 - 行 2).在 int64(y,y) 的 2Dmatrix 中,如果值高于某个阈值,则将该值存储在索引 [row1,row2] 中,如果低于某个阈值,则将该值存储在索引 [row2,row1] 中.

I have a function that operates on a 2D matrix on float64(x,y). Basic concept: for each combination of rows (no. rows choose 2) count the number of positiv values after subtraction (row1 - row2). In a 2Dmatrix of int64(y,y) store this value in index [row1,row2] if value is above a certain threshold and [row2,row1] if below.

我已经实现并用@njit(parallel=False) 装饰它,效果很好@njit(parallel=True) 似乎没有加速.为了加快整个过程,我查看了@guvectorize,它也有效.但是,在这种情况下,我也无法弄清楚如何将 @guvectorize 与 parallel true 一起使用.

I've implemented that and decorated it with @njit(parallel=False), that works fine @njit(parallel=True) seems to give no speedup. Trying to speed up the whole thing I had a look at @guvectorize, that works as well. However I'm not able to figure out how to use @guvectorize with parallel true in this case either.

我查看了 numba guvectorize target='parallel' 慢比 target='cpu' ,其中解决方案是使用 @vecorize,但我无法将解决方案转移到我的问题,因此我现在正在寻求帮助 :)

I had a look at numba guvectorize target='parallel' slower than target='cpu' , where the solution was to use @vecorize instead, but I can not transfer the solution to my problem, therefore I am now seeking help :)

基本的jitted和guvectorized实现

Basic jitted and guvectorized implementation

import numpy as np
from numba import jit, guvectorize, prange
import timeit

@jit(parallel=False)
def check_pairs_sg(raw_data):
    # 2D array to be filled
    result = np.full((len(raw_data), len(raw_data)), -1)

    # Iterate over all possible gene combinations
    for r1 in range(0, len(raw_data)):
        for r2 in range(r1+1, len(raw_data)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

    return result

@jit(parallel=True)
def check_pairs_multi(raw_data):
    # 2D array to be filled
    result = np.full((len(raw_data), len(raw_data)), -1)

    # Iterate over all possible gene combinations
    for r1 in range(0, len(raw_data)):
        for r2 in prange(r1+1, len(raw_data)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

    return result

@guvectorize(["void(float64[:,:], int64[:,:])"],
             "(n,m)->(m,m)", target='cpu')
def check_pairs_guvec_sg(raw_data, result):
    for r1 in range(0, len(result)):
        for r2 in range(r1+1, len(result)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

@guvectorize(["void(float64[:,:], int64[:,:])"],
             "(n,m)->(m,m)", target='parallel')
def check_pairs_guvec_multi(raw_data, result):
    for r1 in range(0, len(result)):
        for r2 in range(r1+1, len(result)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

if __name__=="__main__":
     np.random.seed(404)
     a = np.random.random((512,512)).astype(np.float64)
     res = np.full((len(a), len(a)), -1)

并用

%timeit check_pairs_sg(a)
%timeit check_pairs_multi(a)
%timeit check_pairs_guvec_sg(a, res)
%timeit check_pairs_guvec_multi(a, res)

导致:

614 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
507 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
622 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
671 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我一直在思考如何将其实现为 @vectorized 或适当的并行 @guvectorize 以真正并行地填充生成的二维数组.

I cat wrap my head around on how to implement this as @vectorized or a proper parallel @guvectorize to fill the resulting 2D array truely in parallel.

我想这是我尝试将其进一步推向 gpu 之前的第一步.

I guess this is my first step before trying to taking this further to gpu.

非常感谢任何帮助.

推荐答案

写Numba代码时考虑其他编译语言

例如,想想这些行的或多或少完全等效的实现

Think of other compiled languages when writing Numba code

For example think of a more or less exact equivalent implementation of the lines

diff = np.subtract(raw_data[:, r1], raw_data[:, r2])
num_pos = len(np.where(diff > 0)[0])

在 C++ 中.

伪代码

  • 分配一个数组差异,循环raw_data[i*size_dim_1+r1](循环索引为i)
  • 分配一个布尔数组,循环遍历整个数组 diff 并检查是否 diff[i]>0
  • 循环遍历布尔数组,获取 b_arr==True 的索引,并通过 vector::push_back() 将它们保存到向量中.
  • 检查向量的大小

您代码中的主要问题是:

  • 为简单操作创建临时数组
  • 非连续内存访问

移除临时数组并简化

@nb.njit(parallel=False)
def check_pairs_simp(raw_data):
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[1]):
        for r2 in range(r1+1, raw_data.shape[1]):
            num_pos=0
            for i in range(raw_data.shape[0]):
                if (raw_data[i,r1]>raw_data[i,r2]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

移除临时数组和简化 + 连续内存访问

@nb.njit(parallel=False)
def check_pairs_simp_rev(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

移除临时数组和简化 + 连续内存访问 + 并行化

@nb.njit(parallel=True,fastmath=True)
def check_pairs_simp_rev_p(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in nb.prange(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

时间

%timeit check_pairs_sg(a)
488 ms ± 8.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit check_pairs_simp(a)
186 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit check_pairs_simp_rev(a)
12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit check_pairs_simp_rev_p(a)
5.43 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这篇关于Numba - 如何并行填充二维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆