将multiindex转换为行多维NumPy数组. [英] Transforming multiindex to row-wise multi-dimensional NumPy array.
问题描述
假设我有一个MultiIndex DataFrame,它类似于 MultiIndex文档一个>.
Suppose I have a MultiIndex DataFrame similar to an example from the MultiIndex docs.
>>> df
0 1 2 3
first second
bar one 0 1 2 3
two 4 5 6 7
baz one 8 9 10 11
two 12 13 14 15
foo one 16 17 18 19
two 20 21 22 23
qux one 24 25 26 27
two 28 29 30 31
我想从此DataFrame生成一个NumPy数组,该数组具有3维结构,如
I want to generate a NumPy array from this DataFrame with a 3-dimensional structure like
>>> desired_arr
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]],
[[16, 20],
[17, 21],
[18, 22],
[19, 23]],
[[24, 28],
[25, 29],
[26, 30],
[27, 31]]])
我该怎么做?
希望这里发生的事情很清楚-我实际上是在第一级对DataFrame进行拆栈,然后尝试将结果列MultiIndex中的每个顶级都转换为其自己的二维数组.
Hopefully it is clear what is happening here - I am effectively unstacking the DataFrame by the first level and then trying to turn each top level in the resulting column MultiIndex to its own 2-dimensional array.
我可以通过
>>> df.unstack(1)
0 1 2 3
second one two one two one two one two
first
bar 0 4 1 5 2 6 3 7
baz 8 12 9 13 10 14 11 15
foo 16 20 17 21 18 22 19 23
qux 24 28 25 29 26 30 27 31
但是,除了通过循环和列表明确地进行操作外,我还在努力寻找一种将每列转换为二维数组然后将它们连接在一起的好方法.
but then I am struggling to find a nice way to turn each column into a 2-dimensional array and then join them together, beyond doing so explicitly with loops and lists.
我觉得应该有某种方法可以预先指定所需的NumPy数组的形状,并用np.nan
填充它,然后使用特定的迭代顺序用DataFrame填充值,但是我没有进行管理还没有用这种方法解决问题.
I feel like there should be some way for me to specify the shape of my desired NumPy array beforehand, fill it with np.nan
and then use a specific iterating order to fill the values with my DataFrame, but I have not managed to solve the problem with this approach yet .
要生成示例数据框
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
ind = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.arange(8*4).reshape((8, 4)), index=ind)
推荐答案
某些 swapaxes 魔术-
df.values.reshape(4,2,-1).swapaxes(1,2)
通用化为-
m,n = len(df.index.levels[0]), len(df.index.levels[1])
arr = df.values.reshape(m,n,-1).swapaxes(1,2)
基本上将第一个轴分为两个长度4
和2
,创建一个3D
数组,然后交换最后两个轴,即,将长度为2
的轴向后推(作为最后一个一个).
Basically splitting the first axis into two of lengths 4
and 2
creating a 3D
array and then swapping the last two axes, i.e. pushing in the axis of length 2
to the back (as the last one).
样本输出-
In [35]: df.values.reshape(4,2,-1).swapaxes(1,2)
Out[35]:
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]],
[[16, 20],
[17, 21],
[18, 22],
[19, 23]],
[[24, 28],
[25, 29],
[26, 30],
[27, 31]]])
这篇关于将multiindex转换为行多维NumPy数组.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!