将一个大的numpy数组拆分为带有分组索引列表的单独数组 [英] Split a large numpy array into separate arrays with a list of grouped indices
问题描述
给出2个数组:一个用于主数据集,第二个作为引用主数据集的分组索引的列表.我正在寻找从给定的索引数据生成新数组最快的方法吗?
Given 2 arrays: One for a master dataset, and the second as list of grouped indices that reference the master dataset. I'm looking for the fastest to generate new arrays from the given index data?
这是我当前的解决方案,用于从双键列表生成2个数组:
Here's my current solution for generating 2 arrays from a list of double keys:
# Lets make a large point cloud with 1 million entries and a list of random paired indices
import numpy as np
COUNT = 1000000
POINT_CLOUD = np.random.rand(COUNT,3) * 100
INDICES = (np.random.rand(COUNT,2)*COUNT).astype(int) # (1,10),(233,12),...
# Split into sublists, np.squeeze is needed here because i don't want arrays of single elements.
LIST1 = POINT_CLOUD[np.squeeze(INDICES[:,[0]])]
LIST2 = POINT_CLOUD[np.squeeze(INDICES[:,[1]])]
这有效,但是有点慢,并且仅适用于生成2个列表,最好有一个可以处理任意大小的索引组的解决方案(例如:(((1,2,3,4) ,(8,4,5,3),...)
This works, but it's a little slow, and it's only good for generating 2 lists, it would be great to have a solution that could tackle any size of index groups (ex: ((1,2,3,4),(8,4,5,3),...)
类似这样:
# PSEUDO CODE using quadruple keys
INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
SPLIT = POINT_CLOUD[<some pythonic magic>[INDICES]]
SPLIT[0] = np.array([points from INDEX #1])
SPLIT[1] = np.array([points from INDEX #2])
SPLIT[2] = np.array([points from INDEX #3])
SPLIT[3] = np.array([points from INDEX #4])
推荐答案
您只需要重塑索引数组:
You just have to reshape the index array:
>>> result = POINT_CLOUD[INDICES.T]
>>> np.allclose(result[0], LIST1)
True
>>> np.allclose(result[1], LIST2)
True
如果您知道子数组的数量,也可以解压缩列表
If you know the number of sub-arrays you can also unpack the list
>>> result.shape
(2, 1000000, 3)
>>> L1, L2 = result
>>> np.allclose(L1, LIST1)
True
>>> # etc
这适用于较大的索引组.对于您问题中的第二个示例:
This works for larger index groups. For the second example in your question:
>>> INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
>>> SPLIT = POINT_CLOUD[INDICES.T]
>>> SPLIT.shape
(4, 1000000, 3)
>>>
这篇关于将一个大的numpy数组拆分为带有分组索引列表的单独数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!