设置堆叠 pandas DataFrame时的列名 [英] Set column names when stacking pandas DataFrame

查看:121
本文介绍了设置堆叠 pandas DataFrame时的列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

堆叠大熊猫DataFrame时,返回Series.通常,在堆叠DataFrame之后,我会将其转换回DataFrame.但是,来自堆叠数据的默认名称使重命名列有些麻烦.我正在寻找的是一种更容易/内置的方式,以便在堆叠后为列提供合理的名称.

When stacking a pandas DataFrame, a Series is returned. Normally after I stack a DataFrame, I convert it back into a DataFrame. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.

例如,对于以下DataFrame:

In [64]: df = pd.DataFrame({'id':[1,2,3], 
    ...:                    'date':['2015-09-31']*3, 
    ...:                    'value':[100, 95, 42], 
    ...:                    'value2':[200, 57, 27]}).set_index(['id','date'])

In [65]: df
Out[65]: 
               value  value2
id date                     
1  2015-09-31    100     200
2  2015-09-31     95      57
3  2015-09-31     42      27

我像这样将其堆叠并转换回DataFrame:

I stack and convert it back to a DataFrame like so:

In [68]: df.stack().reset_index()
Out[68]: 
   id        date level_2    0
0   1  2015-09-31   value  100
1   1  2015-09-31  value2  200
2   2  2015-09-31   value   95
3   2  2015-09-31  value2   57
4   3  2015-09-31   value   42
5   3  2015-09-31  value2   27

因此,为了适当地命名这些列,我需要执行以下操作:

So in order to name these columns appropriately I would need to do something like this:

In [72]: stacked = df.stack()

In [73]: stacked
Out[73]: 
id  date              
1   2015-09-31  value     100
                value2    200
2   2015-09-31  value      95
                value2     57
3   2015-09-31  value      42
                value2     27
dtype: int64

In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)

In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   1  2015-09-31   value2    200
2   2  2015-09-31    value     95
3   2  2015-09-31   value2     57
4   3  2015-09-31    value     42
5   3  2015-09-31   value2     27

理想情况下,解决方案如下所示:

Ideally, the solution would look something like this:

df.stack(new_index_name='var_name', new_col_name='value')

但是请查看 docs 它看起来stack并没有接受任何此类参数.大熊猫中有一种更简单/内置的方式来处理此工作流程吗?

But looking at the docs it doesn't look like stack takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?

推荐答案

id和date索引级别转换为列,则可以在此处使用pd.melt:

pd.melt is often useful for converting DataFrames from "wide" to "long" format. You could use pd.melt here if you convert the id and date index levels to columns first:

In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   2  2015-09-31    value     95
2   3  2015-09-31    value     42
3   1  2015-09-31   value2    200
4   2  2015-09-31   value2     57
5   3  2015-09-31   value2     27

这篇关于设置堆叠 pandas DataFrame时的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆