如何将几个 Pandas 数据框列汇总到一个父列名中? [英] How can I summarize several pandas dataframe columns into a parent column name?
问题描述
我有一个看起来像这样的数据框
I've a dataframe which looks like this
some feature another feature label
sample
0 ... ... ...
我想获得一个带有 multiindexed 像这样的列
and I'd like to get a dataframe with multiindexed columns like this
features label
sample some another
0 ... ... ...
从 API 来看,我不清楚如何使用 from_arrays()
、from_product()
、from_tuples()
或 from_frame()
正确.解决方案不应依赖于特征列(some feature
,another feature
)的字符串解析.标签的最后一列是最后一列,可以使用它的列名 label
.我怎样才能得到我想要的?
From the API it's not clear to me how to use from_arrays()
, from_product()
, from_tuples()
or from_frame()
correctly. The solution shall not depend on string parsing of the feature columns (some feature
, another feature
). The last column for the label is the last column and it's column name label
may be used. How can I get want I want?
推荐答案
从 API 来看,我不清楚如何使用
from_arrays()
、from_product()
、from_tuples()
或from_frame()
正确.
主要用于,如果生成具有独立于原始列名的MultiIndex的新DataFrame.
It is mainly used, if generate new DataFrame with MultiIndex independent of original columns names.
所以这意味着如果需要全新的MultiIndex
,例如通过列表或数组:
So it means if need completely new MultiIndex
, e.g. by lists or arrays:
a = ['a','a','b']
b = ['x','y','z']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
a b
x y z
sample
0 2 3 5
1 4 5 7
如果想将所有列设置为 MultiIndex
所有列,没有最后一个:
If want set all columns to MultiIndex
all columns same way without last one:
a = ['parent'] * (len(df.columns) - 1) + ['label']
b = df.columns[:-1].tolist() + ['val']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
parent label
feature a feature b val
sample
0 2 3 5
1 4 5 7
<小时>
可以通过 split
,但是如果某些没有分隔符的列得到 NaN
s 作为第二级,因为不可能组合 MultiIndex 而不是 MultiIndex 列(实际上是的,但是从 MultiIndex 列中获取元组):
It is possible by split
, but if some column(s) without separator get NaN
s for second level, because is not possible combinations MultiIndex and not MultiIndex columns (actaully yes, but get tuples from MultiIndex columns):
print (df)
feature_a feature_b label
sample
0 2 3 5
1 4 5 7
df.columns = df.columns.str.split(expand=True)
print (df)
feature label
a b NaN
sample
0 2 3 5
1 4 5 7
最好先通过 DataFrame.set_index
:
So better is convert all columns without separator to Index/MultiIndex
first by DataFrame.set_index
:
df = df.set_index('label')
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
label
5 2 3
7 4 5
为了防止使用原始索引append=True
参数:
For prevent original index is used append=True
parameter:
df = df.set_index('label', append=True)
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
sample label
0 5 2 3
1 7 4 5
这篇关于如何将几个 Pandas 数据框列汇总到一个父列名中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!