将一个大的numpy数组拆分为带有分组索引列表的单独数组 [英] Split a large numpy array into separate arrays with a list of grouped indices

查看:337
本文介绍了将一个大的numpy数组拆分为带有分组索引列表的单独数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出2个数组:一个用于主数据集,第二个作为引用主数据集的分组索引的列表.我正在寻找从给定的索引数据生成新数组最快的方法吗?

Given 2 arrays: One for a master dataset, and the second as list of grouped indices that reference the master dataset. I'm looking for the fastest to generate new arrays from the given index data?

这是我当前的解决方案,用于从双键列表生成2个数组:

Here's my current solution for generating 2 arrays from a list of double keys:

# Lets make a large point cloud with 1 million entries and a list of random paired indices
import numpy as np
COUNT = 1000000
POINT_CLOUD = np.random.rand(COUNT,3) * 100
INDICES = (np.random.rand(COUNT,2)*COUNT).astype(int)  # (1,10),(233,12),...

# Split into sublists, np.squeeze is needed here because i don't want arrays of single elements.
LIST1 = POINT_CLOUD[np.squeeze(INDICES[:,[0]])]
LIST2 = POINT_CLOUD[np.squeeze(INDICES[:,[1]])]

这有效,但是有点慢,并且仅适用于生成2个列表,最好有一个可以处理任意大小的索引组的解决方案(例如:(((1,2,3,4) ,(8,4,5,3),...)

This works, but it's a little slow, and it's only good for generating 2 lists, it would be great to have a solution that could tackle any size of index groups (ex: ((1,2,3,4),(8,4,5,3),...)

类似这样:

# PSEUDO CODE using quadruple keys
INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
SPLIT = POINT_CLOUD[<some pythonic magic>[INDICES]]
SPLIT[0] = np.array([points from INDEX #1])
SPLIT[1] = np.array([points from INDEX #2])
SPLIT[2] = np.array([points from INDEX #3])
SPLIT[3] = np.array([points from INDEX #4])

推荐答案

您只需要重塑索引数组:

You just have to reshape the index array:

>>> result = POINT_CLOUD[INDICES.T]
>>> np.allclose(result[0], LIST1)
True
>>> np.allclose(result[1], LIST2)
True

如果您知道子数组的数量,也可以解压缩列表

If you know the number of sub-arrays you can also unpack the list

>>> result.shape
(2, 1000000, 3)
>>> L1, L2 = result
>>> np.allclose(L1, LIST1)
True
>>> # etc

这适用于较大的索引组.对于您问题中的第二个示例:

This works for larger index groups. For the second example in your question:

>>> INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
>>> SPLIT = POINT_CLOUD[INDICES.T]
>>> SPLIT.shape
(4, 1000000, 3)
>>> 

这篇关于将一个大的numpy数组拆分为带有分组索引列表的单独数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆