访问数组的多个元素 [英] Access multiple elements of an array

查看:22
本文介绍了访问数组的多个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在一个操作中为这些元素的已知行和列获取数组元素?在每一行中,我想访问从 col_start 到 col_end 的元素(每行都有不同的开始和结束索引).每行元素个数相同,元素连续.示例:

Is there a way to get array elements in one operation for known rows and columns of those elements? In each row I would like to access elements from col_start to col_end (each row has different starting and ending index). Number of elements is the same for each row, elements are consecutive. Example:

[ . . . . | | | . . . . . ]
[ | | | . . . . . . . . . ]
[ . . | | | . . . . . . . ]
[ . . . . . . . . | | | . ]

一种解决方案是获取元素的索引(行列对),然后使用 my_array[row_list,col_list].

One solution would be to get indexes (row-column pair) of elements, and than use my_array[row_list,col_list].

有没有其他(更简单的)方法而不使用 for 循环?

Is there any other (simpler) way without using for loops?

推荐答案

A = np.arange(40).reshape(4,10)*.1
startend = [[2,5],[3,6],[4,7],[5,8]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([13, 14, 15]), array([24, 25, 26]), array([35, 36, 37])]
A.flat[index_list]

生产

array([[ 0.2,  0.3,  0.4],
       [ 1.3,  1.4,  1.5],
       [ 2.4,  2.5,  2.6],
       [ 3.5,  3.6,  3.7]])

这仍然有一个迭代,但它是一个相当基本的列表.我正在索引 A 的扁平化 1d 版本.np.take(A, index_list) 也可以.

This still has an iteration, but it's a rather basic one over a list. I'm indexing the flattened, 1d, version of A. np.take(A, index_list) also works.

如果行间隔的大小不同,我可以使用 np.r_ 将它们连接起来.这不是绝对必要的,但在从多个区间和值构建索引时很方便.

If the row intervals differ in size, I can use np.r_ to concatenate them. It's not absolutely necessary, but it is a convenience when building up indices from multiple intervals and values.

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.3,  1.4,  1.5,  2.4,  2.5,  2.6,  3.5,  3.6, 3.7])

<小时>

ajcr使用的idx可以不用choose:

idx = [np.arange(v[0], v[1]) for i,v in enumerate(startend)]
A[np.arange(A.shape[0])[:,None], idx]

idx 与我的 index_list 类似,只是它不添加行长度.

idx is like my index_list except that it doesn't add the row length.

np.array(idx)

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7]])

由于每个arange具有相同的长度,idx无需迭代即可生成:

Since each arange has the same length, idx can be generated without iteration:

col_start = np.array([2,3,4,5])
idx = col_start[:,None] + np.arange(3)

第一个索引是一个列数组,它广播以匹配这个idx.

The first index is a column array that broadcasts to match this idx.

np.arange(A.shape[0])[:,None] 
array([[0],
       [1],
       [2],
       [3]])

使用这个 Aidx 我得到以下时间:

With this A and idx I get the following timings:

In [515]: timeit np.choose(idx,A.T[:,:,None])
10000 loops, best of 3: 30.8 µs per loop

In [516]: timeit A[np.arange(A.shape[0])[:,None],idx]
100000 loops, best of 3: 10.8 µs per loop

In [517]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 24.9 µs per loop

flat 索引速度更快,但计算更高级的索引需要一些时间.

The flat indexing is faster, but calculating the fancier index takes up some time.

对于大型数组,flat 索引的速度占主导地位.

For large arrays, the speed of flat indexing dominates.

A=np.arange(4000).reshape(40,100)*.1
col_start=np.arange(20,60)
idx=col_start[:,None]+np.arange(30)

In [536]: timeit A[np.arange(A.shape[0])[:,None],idx]
10000 loops, best of 3: 108 µs per loop

In [537]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 59.4 µs per loop

np.choose 方法遇到了硬编码限制:需要 2 到 (32) 个数组对象(含).

The np.choose method runs into a hardcoded limit: Need between 2 and (32) array objects (inclusive).

什么越界idx?

col_start=np.array([2,4,6,8])
idx=col_start[:,None]+np.arange(3)
A[np.arange(A.shape[0])[:,None], idx]

产生错误,因为最后一个 idx 值是 10,太大了.

produces an error because the last idx value is 10, too large.

你可以clip idx

idx=idx.clip(0,A.shape[1]-1)

在最后一行产生重复值

[ 3.8,  3.9,  3.9]

您也可以在索引之前填充 A.更多选项请参见 np.pad.

You could also pad A before indexing. See np.pad for more options.

np.pad(A,((0,0),(0,2)),'edge')[np.arange(A.shape[0])[:,None], idx]

另一种选择是删除越界值.idx 然后将成为一个不规则的列表列表(或列表数组).flat 方法可以处理这个问题,尽管结果不是矩阵.

Another option is to remove out of bounds values. idx would then become a ragged list of lists (or array of lists). The flat approach can handle this, though the result will not be a matrix.

startend = [[2,5],[4,7],[6,9],[8,10]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([14, 15, 16]), array([26, 27, 28]), array([38, 39])]

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.4,  1.5,  1.6,  2.6,  2.7,  2.8,  3.8,  3.9])

这篇关于访问数组的多个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆