将MxN二维数据点数组重组为N维数组 [英] Reorganizing an MxN 2D array of datapoints into an N-dimensional array

查看:91
本文介绍了将MxN二维数据点数组重组为N维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在2D阵列中进行了一系列测量,例如

I've got a series of measurements in a 2D array such as

T    mu1  mu2  mu3  a    b    c    d    e
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  3.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  1.0  2.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  1.0  3.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  2.0  0.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  2.0  1.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  2.0  2.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  2.0  3.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  3.0  0.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  3.0  1.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  3.0  2.0  0.0  0.0  0.0  0.0  0.0
0.0  1.0  3.0  3.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  0.0  3.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  1.0  2.0  0.0  0.0  0.0  0.0  0.0
1.0  0.0  1.0  3.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  2.0  0.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  2.0  1.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  2.0  2.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  2.0  3.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  3.0  0.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  3.0  1.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  3.0  2.0  0.0  0.0  0.0  0.0  0.0
1.0  1.0  3.0  3.0  0.0  0.0  0.0  0.0  0.0

其中,Tmu1mu2mu3是我控制的变量(独立变量)的4个轴. abcde是我所做的测量(因变量).

where T, mu1, mu2 and mu3 are the 4 axes of the variables I control (independent variables). a, b, c, d and e are the measurements I've made (dependent variables).

我想将此2D数组转换为numpy的5D数组.通过指定Tmu1mu2mu3(或至少它们的4个索引),我希望能够检索相应的abcde值.

I would like to convert this 2D array into a 5D array in numpy. By specifying T, mu1, mu2 and mu3 (or at least their 4 indexes) I want to be able to retrieve the corresponding a, b, c, d and e values.

是否有一种简单的方法可以通过指定轴对应的列来重塑这种数组?熊猫(Pandas)中的MultiIndex似乎很聪明地将其组织在一个表中,但似乎不适合用于高维数组.我不一定会提前知道ndarray的形状,但在我看来,基于这些值,应该可以正确地对数组进行整形.每个轴的增量值也可能不同,但始终是一致的.

Is there a straightforward way to reshape this kind of array by specifying what columns the axes correspond to? The MultiIndex in Pandas seemed to smartly organize it in a table, but seems ill-suited for high dimensional arrays. I won't necessarily know ahead of time what the shape of the ndarray should be, but it seems to me that based on the values it should be possible to reshape the array properly. The increment values for each axis might also be different, but they will always be uniform.

我当前的想法涉及忽略mu1mu2mu3列,并将T数据集堆叠到3D数组中.从那里,我会将3D mu1数据集堆叠到4D数组中,并使用mu2mu3重复该过程.这似乎是一个繁琐的过程,但是应该有一个简单的解决方案.

My current idea involves ignoring the mu1, mu2 and mu3 columns, and stacking sets of T data into a 3D array. From there I would stack sets of 3D mu1 data into a 4D array, and repeat the process with mu2 and mu3. This seems like a tedious process that should have a simple solution though.

推荐答案

首先,让我们制作一些假数据:

First, let's make some fake data:

# an N x 5 array containing a regular mesh representing the stimulus params
stim_params = np.mgrid[:2, :3, :4, :5, :6].reshape(5, -1).T

# an N x 3 array representing the output values for each simulation run
output_vals = np.arange(720 * 3).reshape(720, 3)

# shuffle the rows for a bit of added realism
shuf = np.random.permutation(stim_params.shape[0])
stim_params = stim_params[shuf]
output_vals = output_vals[shuf]

现在,您可以使用 np.lexsort 来获取一组索引,该索引将对2D模拟参数数组的行进行排序,以使每列中的值按升序排列.完成此操作后,您可以将这些索引应用于模拟输出值的行.

Now you can use np.lexsort to get the set of indices that will sort the rows of your 2D array of simulation parameters such that the values in each column are in ascending order. Having done that, you can apply these indices to the rows of simulation output values.

# get the number of unique values for each stimulus parameter
params_shape = tuple(np.unique(col).shape[0] for col in stim_params.T)

# get the set of row indices that will sort the stimulus parameters in ascending
# order, starting with the final column
idx = np.lexsort(stim_params[:, ::-1].T)

# sort and reshape the stimulus parameters:
sorted_params = stim_params[idx].T.reshape((5,) + params_shape)

# sort and reshape the output values
sorted_output = output_vals[idx].T.reshape((3,) + params_shape)

我发现最困难的部分通常只是想把输出的所有不同维度对应的内容包起来:

I find that the hardest part is often just trying to wrap your head around what all the different dimensions of the outputs correspond to:

# array of stimulus parameters, with dimensions (n_params, p1, p2, p3, p4, p5)
print(sorted_params.shape)
# (5, 2, 3, 4, 5, 6)

# to check that the sorting worked as expected, we can look at the values of the 
# 5th parameter when all the others are held constant at 0:
print(sorted_params[4, 0, 0, 0, 0, :])
# [0 1 2 3 4 5]

# ... and the 1st parameter when we hold all the others constant:
print(sorted_params[0, :, 0, 0, 0, 0])
# [0, 1]

# ... now let the 1st and 2nd parameters covary:
print(sorted_params[:2, :, :, 0, 0, 0])
# [[[0 0 0]
#   [1 1 1]]

#  [[0 1 2]
#   [0 1 2]]]

希望您能理解.相同的索引逻辑适用于排序后的模拟输出:

Hopefully you get the idea. The same indexing logic applies to the sorted simulation outputs:

# array of outputs, with dimensions (n_outputs, p1, p2, p3, p4, p5)
print(sorted_output.shape)
# (3, 2, 3, 4, 5, 6)

# the first output variable whilst holding the first 4 simulation parameters
# constant at 0:
print(sorted_output[0, 0, 0, 0, 0, :])
# [ 0  3  6  9 12 15]

这篇关于将MxN二维数据点数组重组为N维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆