如何将几个 Pandas 数据框列汇总到一个父列名中? [英] How can I summarize several pandas dataframe columns into a parent column name?

查看:56
本文介绍了如何将几个 Pandas 数据框列汇总到一个父列名中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框

I've a dataframe which looks like this

        some feature  another feature  label
sample
0       ...           ...              ...

我想获得一个带有 multiindexed 像这样的列

and I'd like to get a dataframe with multiindexed columns like this

        features            label
sample  some       another
0       ...        ...      ...

从 API 来看,我不清楚如何使用 from_arrays()from_product()from_tuples()from_frame() 正确.解决方案不应依赖于特征列(some featureanother feature)的字符串解析.标签的最后一列是最后一列,可以使用它的列名 label.我怎样才能得到我想要的?

From the API it's not clear to me how to use from_arrays(), from_product(), from_tuples() or from_frame() correctly. The solution shall not depend on string parsing of the feature columns (some feature, another feature). The last column for the label is the last column and it's column name label may be used. How can I get want I want?

推荐答案

从 API 来看,我不清楚如何使用 from_arrays()from_product()from_tuples()from_frame() 正确.

主要用于,如果生成具有独立于原始列名的MultiIndex的新DataFrame.

It is mainly used, if generate new DataFrame with MultiIndex independent of original columns names.

所以这意味着如果需要全新的MultiIndex,例如通过列表或数组:

So it means if need completely new MultiIndex, e.g. by lists or arrays:

a = ['a','a','b']
b = ['x','y','z']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
        a     b
        x  y  z
sample         
0       2  3  5
1       4  5  7

如果想将所有列设置为 MultiIndex 所有列,没有最后一个:

If want set all columns to MultiIndex all columns same way without last one:

a = ['parent'] * (len(df.columns) - 1) + ['label']
b = df.columns[:-1].tolist() + ['val']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
          parent           label
       feature a feature b   val
sample                          
0              2         3     5
1              4         5     7

<小时>

可以通过 split,但是如果某些没有分隔符的列得到 NaNs 作为第二级,因为不可能组合 MultiIndex 而不是 MultiIndex 列(实际上是的,但是从 MultiIndex 列中获取元组):


It is possible by split, but if some column(s) without separator get NaNs for second level, because is not possible combinations MultiIndex and not MultiIndex columns (actaully yes, but get tuples from MultiIndex columns):

print (df)
        feature_a  feature_b  label
sample                             
0               2          3      5
1               4          5      7

df.columns = df.columns.str.split(expand=True)
print (df)
       feature    label
             a  b   NaN
sample                 
0            2  3     5
1            4  5     7

最好先通过 DataFrame.set_index:

So better is convert all columns without separator to Index/MultiIndex first by DataFrame.set_index:

df = df.set_index('label')
df.columns = df.columns.str.split(expand=True)
print (df)
      feature   
            a  b
label           
5           2  3
7           4  5

为了防止使用原始索引append=True 参数:

For prevent original index is used append=True parameter:

df = df.set_index('label', append=True)
df.columns = df.columns.str.split(expand=True)
print (df)
             feature   
                   a  b
sample label           
0      5           2  3
1      7           4  5

这篇关于如何将几个 Pandas 数据框列汇总到一个父列名中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆