用索引填充数组中的一维numpy数组 [英] Fill 1D numpy array from arrays with indices
问题描述
背景
我有一个用零初始化的一维NumPy数组.
I have one 1D NumPy array initialized with zeroes.
import numpy as np
section = np.zeros(1000)
然后我有一个Pandas DataFrame,其中在两列中都有索引:
Then I have a Pandas DataFrame where I have indices in two columns:
d= {'start': {0: 7200, 1: 7500, 2: 7560, 3: 8100, 4: 11400},
'end': {0: 10800, 1: 8100, 2: 8100, 3: 8150, 4: 12000}}
df = pd.DataFrame(data=d, columns=['start', 'end'])
对于每对索引,我想将numpy数组中相应索引的值设置为True.
For each pair of indices, I want to set the value of the corresponding indices in the numpy array to True.
我当前的解决方案
我可以通过将一个函数应用于DataFrame来做到这一点:
I can do this by applying a function to the DataFrame:
def fill_array(row):
section[row.start:row.end] = True
df.apply(fill_array, axis=1)
我要对该操作进行矢量化
这符合我的预期,但是出于乐趣,我想对操作进行矢量化处理.我对此不是很熟练,而且我在网上搜索并没有使我走上正确的轨道.
This works as I expect, but for the fun of it, I would like to vectorize the operation. I'm not very proficient with this, and my searching online has not put me on the right track.
如果有可能,我将非常感谢有关如何将其转换为矢量操作的任何建议.
I would really appreciate any suggestions on how to make this into a vector operation, if at all possible.
推荐答案
实现的诀窍是,我们将在初始化为零的int数组的每个起点放置1s
,在每个终点放置-1s
.接下来是实际的技巧,因为我们会对其进行累计求和,从而为bin(开始-停止对)边界所覆盖的位置提供非零数字.因此,最后一步是寻找非零值,以将最终输出作为布尔数组.因此,我们将有两个向量化的解决方案,其实现如下所示-
The trick for the implementation to follow is that we would put 1s
at every start points and -1s
at every end points on a zeros initialized int array. The actual trick comes next, as we would cumulatively sum it, giving us non-zero numbers for the positions covered by the bin (start-stop pair) boundaries. So, the final step is to look for non-zeros for a final output as a boolean array. Thus, we would have two vectorized solutions, with their implementations shown below -
def filled_array(start, end, length):
out = np.zeros((length), dtype=int)
np.add.at(out,start,1)
np.add.at(out,end,-1)
return out.cumsum()>0
def filled_array_v2(start, end, length): #Using @Daniel's suggestion
out =np.bincount(start, minlength=length) - np.bincount(end, minlength=length)
return out.cumsum().astype(bool)
样品运行-
In [2]: start
Out[2]: array([ 4, 7, 5, 15])
In [3]: end
Out[3]: array([12, 12, 7, 17])
In [4]: out = filled_array(start, end, length=20)
In [7]: pd.DataFrame(out) # print as dataframe for easy verification
Out[7]:
0
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 False
13 False
14 False
15 True
16 True
17 False
18 False
19 False
这篇关于用索引填充数组中的一维numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!