设置堆叠 pandas DataFrame时的列名 [英] Set column names when stacking pandas DataFrame
问题描述
堆叠大熊猫DataFrame
时,返回Series
.通常,在堆叠DataFrame
之后,我会将其转换回DataFrame
.但是,来自堆叠数据的默认名称使重命名列有些麻烦.我正在寻找的是一种更容易/内置的方式,以便在堆叠后为列提供合理的名称.
When stacking a pandas DataFrame
, a Series
is returned. Normally after I stack a DataFrame
, I convert it back into a DataFrame
. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.
例如,对于以下DataFrame
:
In [64]: df = pd.DataFrame({'id':[1,2,3],
...: 'date':['2015-09-31']*3,
...: 'value':[100, 95, 42],
...: 'value2':[200, 57, 27]}).set_index(['id','date'])
In [65]: df
Out[65]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
我像这样将其堆叠并转换回DataFrame
:
I stack and convert it back to a DataFrame
like so:
In [68]: df.stack().reset_index()
Out[68]:
id date level_2 0
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
因此,为了适当地命名这些列,我需要执行以下操作:
So in order to name these columns appropriately I would need to do something like this:
In [72]: stacked = df.stack()
In [73]: stacked
Out[73]:
id date
1 2015-09-31 value 100
value2 200
2 2015-09-31 value 95
value2 57
3 2015-09-31 value 42
value2 27
dtype: int64
In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)
In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
理想情况下,解决方案如下所示:
Ideally, the solution would look something like this:
df.stack(new_index_name='var_name', new_col_name='value')
但是请查看 docs 它看起来stack
并没有接受任何此类参数.大熊猫中有一种更简单/内置的方式来处理此工作流程吗?
But looking at the docs it doesn't look like stack
takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?
推荐答案
id和date
索引级别转换为列,则可以在此处使用pd.melt
:
pd.melt
is often useful for converting DataFrames from "wide" to "long" format. You could use pd.melt
here if you convert the id
and date
index levels to columns first:
In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]:
id date var_name value
0 1 2015-09-31 value 100
1 2 2015-09-31 value 95
2 3 2015-09-31 value 42
3 1 2015-09-31 value2 200
4 2 2015-09-31 value2 57
5 3 2015-09-31 value2 27
这篇关于设置堆叠 pandas DataFrame时的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!