构造3D Pandas DataFrame [英] Constructing 3D Pandas DataFrame

查看:105
本文介绍了构造3D Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Pandas中构建3D DataFrame有困难.我想要这样的东西

I'm having difficulty constructing a 3D DataFrame in Pandas. I want something like this

A               B               C
start    end    start    end    start    end ...
7        20     42       52     90       101
11       21                     213      34
56       74                     9        45
45       12

其中AB等是顶级描述符,而startend是子描述符.接下来的数字是成对的,并且AB等的对数不相同.观察到A有四个这样的对,B只有1,而C有3

Where A, B, etc are the top-level descriptors and start and end are subdescriptors. The numbers that follow are in pairs and there aren't the same number of pairs for A, B etc. Observe that A has four such pairs, B has only 1, and C has 3.

我不确定如何继续构建此DataFrame.修改示例并没有为我提供设计的输出:

I'm not sure how to proceed in constructing this DataFrame. Modifying this example didn't give me the designed output:

import numpy as np
import pandas as pd

A = np.array(['one', 'one', 'two', 'two', 'three', 'three'])
B = np.array(['start', 'end']*3)
C = [np.random.randint(10, 99, 6)]*6
df = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C'])
df.set_index(['A', 'B'], inplace=True)
df

屈服:

                C
 A          B   
 one        start   [22, 19, 16, 20, 63, 54]
              end   [22, 19, 16, 20, 63, 54]
 two        start   [22, 19, 16, 20, 63, 54]
              end   [22, 19, 16, 20, 63, 54]
 three      start   [22, 19, 16, 20, 63, 54]
              end   [22, 19, 16, 20, 63, 54]

有什么方法可以将C中的列表分解成自己的列?

Is there any way of breaking up the lists in C into their own columns?

我的C的结构很重要.看起来如下:

The structure of my C is important. It looks like the following:

 C = [[7,11,56,45], [20,21,74,12], [42], [52], [90,213,9], [101, 34, 45]]

所需的输出是顶部的输出.它表示某个序列(AB.C是不同的序列)内子序列的起点和终点.根据序列本身,有不同数量的子序列可以满足我要寻找的给定条件.结果,AB

And the desired output is the one at the top. It represents the starting and ending points of subsequences within a certain sequence (A, B. C are the different sequences). Depending on the sequence itself, there are a differing number of subsequences that satisfy a given condition I'm looking for. As a result, there are a differing number of start:end pairs for A, B, etc

推荐答案

首先,我认为您需要填充C来表示缺失值

First, I think you need to fill C to represent missing values

In [341]: max_len = max(len(sublist) for sublist in C)
In [344]: for sublist in C:
     ...:     sublist.extend([np.nan] * (max_len - len(sublist)))

In [345]: C
Out[345]: 
[[7, 11, 56, 45],
 [20, 21, 74, 12],
 [42, nan, nan, nan],
 [52, nan, nan, nan],
 [90, 213, 9, nan],
 [101, 34, 45, nan]]

然后,将其转换为numpy数组,进行转置,并与列一起传递给DataFrame构造函数.

Then, convert to a numpy array, transpose, and pass to the DataFrame constructor along with the columns.

In [288]: C = np.array(C)
In [289]: df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))

In [349]: df
Out[349]: 
     one         two       three     
   start  end  start  end  start  end
0      7   20     42   52     90  101
1     11   21    NaN  NaN    213   34
2     56   74    NaN  NaN      9   45
3     45   12    NaN  NaN    NaN  NaN

这篇关于构造3D Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆