用索引填充数组中的一维numpy数组 [英] Fill 1D numpy array from arrays with indices

查看:188
本文介绍了用索引填充数组中的一维numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我有一个用零初始化的一维NumPy数组.

I have one 1D NumPy array initialized with zeroes.

import numpy as np
section = np.zeros(1000)

然后我有一个Pandas DataFrame,其中在两列中都有索引:

Then I have a Pandas DataFrame where I have indices in two columns:

d= {'start': {0: 7200, 1: 7500, 2: 7560, 3: 8100, 4: 11400},
    'end': {0: 10800, 1: 8100, 2: 8100, 3: 8150, 4: 12000}}

df = pd.DataFrame(data=d, columns=['start', 'end'])

对于每对索引,我想将numpy数组中相应索引的值设置为True.

For each pair of indices, I want to set the value of the corresponding indices in the numpy array to True.

我当前的解决方案

我可以通过将一个函数应用于DataFrame来做到这一点:

I can do this by applying a function to the DataFrame:

def fill_array(row):
    section[row.start:row.end] = True

df.apply(fill_array, axis=1)

我要对该操作进行矢量化

这符合我的预期,但是出于乐趣,我想对操作进行矢量化处理.我对此不是很熟练,而且我在网上搜索并没有使我走上正确的轨道.

This works as I expect, but for the fun of it, I would like to vectorize the operation. I'm not very proficient with this, and my searching online has not put me on the right track.

如果有可能,我将非常感谢有关如何将其转换为矢量操作的任何建议.

I would really appreciate any suggestions on how to make this into a vector operation, if at all possible.

推荐答案

实现的诀窍是,我们将在初始化为零的int数组的每个起点放置1s,在每个终点放置-1s .接下来是实际的技巧,因为我们会对其进行累计求和,从而为bin(开始-停止对)边界所覆盖的位置提供非零数字.因此,最后一步是寻找非零值,以将最终输出作为布尔数组.因此,我们将有两个向量化的解决方案,其实现如下所示-

The trick for the implementation to follow is that we would put 1s at every start points and -1s at every end points on a zeros initialized int array. The actual trick comes next, as we would cumulatively sum it, giving us non-zero numbers for the positions covered by the bin (start-stop pair) boundaries. So, the final step is to look for non-zeros for a final output as a boolean array. Thus, we would have two vectorized solutions, with their implementations shown below -

def filled_array(start, end, length):
    out = np.zeros((length), dtype=int)
    np.add.at(out,start,1)
    np.add.at(out,end,-1)
    return out.cumsum()>0

def filled_array_v2(start, end, length): #Using @Daniel's suggestion
    out =np.bincount(start, minlength=length) - np.bincount(end, minlength=length)
    return out.cumsum().astype(bool)

样品运行-

In [2]: start
Out[2]: array([ 4,  7,  5, 15])

In [3]: end
Out[3]: array([12, 12,  7, 17])

In [4]: out = filled_array(start, end, length=20)

In [7]: pd.DataFrame(out) # print as dataframe for easy verification
Out[7]: 
        0
0   False
1   False
2   False
3   False
4    True
5    True
6    True
7    True
8    True
9    True
10   True
11   True
12  False
13  False
14  False
15   True
16   True
17  False
18  False
19  False

这篇关于用索引填充数组中的一维numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆