将multiindex转换为行多维NumPy数组. [英] Transforming multiindex to row-wise multi-dimensional NumPy array.

查看:423
本文介绍了将multiindex转换为行多维NumPy数组.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个MultiIndex DataFrame,它类似于 MultiIndex文档.

Suppose I have a MultiIndex DataFrame similar to an example from the MultiIndex docs.

>>> df 
               0   1   2   3
first second                
bar   one      0   1   2   3
      two      4   5   6   7
baz   one      8   9  10  11
      two     12  13  14  15
foo   one     16  17  18  19
      two     20  21  22  23
qux   one     24  25  26  27
      two     28  29  30  31

我想从此DataFrame生成一个NumPy数组,该数组具有3维结构,如

I want to generate a NumPy array from this DataFrame with a 3-dimensional structure like

>>> desired_arr
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]],

       [[16, 20],
        [17, 21],
        [18, 22],
        [19, 23]],

       [[24, 28],
        [25, 29],
        [26, 30],
        [27, 31]]])

我该怎么做?

希望这里发生的事情很清楚-我实际上是在第一级对DataFrame进行拆栈,然后尝试将结果列MultiIndex中的每个顶级都转换为其自己的二维数组.

Hopefully it is clear what is happening here - I am effectively unstacking the DataFrame by the first level and then trying to turn each top level in the resulting column MultiIndex to its own 2-dimensional array.

我可以通过

>>> df.unstack(1)
         0       1       2       3    
second one two one two one two one two
first                                 
bar      0   4   1   5   2   6   3   7
baz      8  12   9  13  10  14  11  15
foo     16  20  17  21  18  22  19  23
qux     24  28  25  29  26  30  27  31

但是,除了通过循环和列表明确地进行操作外,我还在努力寻找一种将每列转换为二维数组然后将它们连接在一起的好方法.

but then I am struggling to find a nice way to turn each column into a 2-dimensional array and then join them together, beyond doing so explicitly with loops and lists.

我觉得应该有某种方法可以预先指定所需的NumPy数组的形状,并用np.nan填充它,然后使用特定的迭代顺序用DataFrame填充值,但是我没有进行管理还没有用这种方法解决问题.

I feel like there should be some way for me to specify the shape of my desired NumPy array beforehand, fill it with np.nan and then use a specific iterating order to fill the values with my DataFrame, but I have not managed to solve the problem with this approach yet .

要生成示例数据框

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
ind = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.arange(8*4).reshape((8, 4)), index=ind)

推荐答案

某些 swapaxes 魔术-

df.values.reshape(4,2,-1).swapaxes(1,2)

通用化为-

m,n = len(df.index.levels[0]), len(df.index.levels[1])
arr = df.values.reshape(m,n,-1).swapaxes(1,2)

基本上将第一个轴分为两个长度42,创建一个3D数组,然后交换最后两个轴,即,将长度为2的轴向后推(作为最后一个一个).

Basically splitting the first axis into two of lengths 4 and 2 creating a 3D array and then swapping the last two axes, i.e. pushing in the axis of length 2 to the back (as the last one).

样本输出-

In [35]: df.values.reshape(4,2,-1).swapaxes(1,2)
Out[35]: 
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]],

       [[16, 20],
        [17, 21],
        [18, 22],
        [19, 23]],

       [[24, 28],
        [25, 29],
        [26, 30],
        [27, 31]]])

这篇关于将multiindex转换为行多维NumPy数组.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆