将具有坐标的1D数组转换为numpy中的2D数组 [英] Convert 1D array with coordinates into 2D array in numpy

查看:128
本文介绍了将具有坐标的1D数组转换为numpy中的2D数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个形状为(N,)的值 arr 数组和一个形状为 coords 的坐标数组(N,2)。我想用(M,M)数组 grid 表示它,使得 grid 在坐标为不在坐标中,并且对于包含的坐标,它应将所有值的总和存储在 arr 中那个坐标。因此,如果M = 3,则 arr = np.arange(4)+1 ,而 coords = np.array([[0,0,1 ,2],[0,0,2,2]]),然后 grid 应该是:

I have an array of values arr with shape (N,) and an array of coordinates coords with shape (N,2). I want to represent this in an (M,M) array grid such that grid takes the value 0 at coordinates that are not in coords, and for the coordinates that are included it should store the sum of all values in arr that have that coordinate. So if M=3, arr = np.arange(4)+1, and coords = np.array([[0,0,1,2],[0,0,2,2]]) then grid should be:

array([[3., 0., 0.],
       [0., 0., 3.],
       [0., 0., 4.]])

这很重要的原因是我需要重复此步骤很多次, arr 中的值每次都会更改,因此坐标也会更改。理想情况下,我正在寻找矢量化解决方案。我怀疑我可能能够以某种方式使用 np.where ,但如何使用尚不是很明显。

The reason this is nontrivial is that I need to be able to repeat this step many times and the values in arr change each time, and so can the coordinates. Ideally I am looking for a vectorized solution. I suspect that I might be able to use np.where somehow but it's not immediately obvious how.

对解决方案进行计时

我已经对当前出现的解决方案进行了计时,看来累加器方法比稀疏矩阵方法要快一些,第二种累积方法是最慢的,其原因在注释中解释:

I have timed the solutions present at this time and it appear that the accumulator method is slightly faster than the sparse matrix method, with the second accumulation method being the slowest for the reasons explained in the comments:

%timeit for x in range(100): accumulate_arr(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): accumulate_arr_v2(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): sparse.coo_matrix((np.random.normal(0,1,10000),np.random.randint(100,size=(2,10000))),(100,100)).A
47.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
103 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
48.2 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


推荐答案

使用 np.bincount -

def accumulate_arr(coords, arr):
    # Get output array shape
    m,n = coords.max(1)+1

    # Get linear indices to be used as IDs with bincount
    lidx = np.ravel_multi_index(coords, (m,n))
    # Or lidx = coords[0]*(coords[1].max()+1) + coords[1]

    # Accumulate arr with IDs from lidx
    return np.bincount(lidx,arr,minlength=m*n).reshape(m,n)

样本运行-

In [58]: arr
Out[58]: array([1, 2, 3, 4])

In [59]: coords
Out[59]: 
array([[0, 0, 1, 2],
       [0, 0, 2, 2]])

In [60]: accumulate_arr(coords, arr)
Out[60]: 
array([[3., 0., 0.],
       [0., 0., 3.],
       [0., 0., 4.]])

另一个带有 np.add的对象.at 在相似的行上,可能更容易理解-

Another with np.add.at on similar lines and might be easier to follow -

def accumulate_arr_v2(coords, arr):
    m,n = coords.max(1)+1
    out = np.zeros((m,n), dtype=arr.dtype)
    np.add.at(out, tuple(coords), arr)
    return out

这篇关于将具有坐标的1D数组转换为numpy中的2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆