在Multiindex DataFrame中添加和重命名列 [英] Adding and Renaming a Column in a Multiindex DataFrame
问题描述
这篇文章的目的是了解如何使用 apply()$在
MultiIndex.DataFrame
中的级别添加列c $ c>和 shift()
The purpose of this post is to understand how to add a column to a level in a MultiIndex.DataFrame
using apply()
and shift()
创建DataFrame
import pandas as pd
df = pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'), ('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio A', 'USD', 'amount'), ('Portfolio A', 'USD', 'injection'),
('Portfolio B', 'JPY', 'amount'), ('Portfolio B', 'JPY', 'injection')])
).sortlevel(axis=1)
print df
我想使用以下方法向每种货币添加一个新列级别2命名为daily_added_value:
I would like to use the following method to add a new column to each currency at level 2 named daily_added_value:
def do_nothing(group):
return group
def calc_daily_added_value(group):
g = (group['amount'] - group['amount'].shift(periods=1, freq=None, axis=0)
-df['injection'].shift(periods=1, freq=None, axis=0)).round(decimals=2)
g.index = ['daily_added_value']
return g
pd.concat([df.T.groupby(level=0).apply(f).T for f in [calc_daily_added_value,do_nothing ]], axis=1).sort_index(axis=1)
但是这会抛出一个ke y错误: KeyError:'amount'
However this throws a key error: KeyError: 'amount'
方法的正确语法是什么 calc_daily_added_value()
?
What is the correct syntax for the method calc_daily_added_value()
?
从下面的答案中还有一个发行
添加每日收益
dav = df.loc[:, pd.IndexSlice[:, :, 'daily_added_value']]
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
dr = (dav.values / amount.shift()) * 100
dr.columns.set_levels(['daily_return'], level=2, inplace=True)
df = pd.concat([df, dr], axis=1).sortlevel(axis=1)
添加累积复合收益FAILS
dr = df.loc[:, pd.IndexSlice[:, :, 'daily_return']]
drc = 100*((1+dr / 100).cumprod()-1)
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
df = pd.concat([df, drc], axis=1).sort_index(axis=1)
df.head()
值,但是如果我添加它成为一个数组?
this fails because it is missing the .values, but if I add this it becomes an array?
这里奇怪的是,drc实际上是一个正确形状的DataFrame等,似乎包含正确的
What is strange here though is that drc is in fact a DataFrame of correct shaped etc. and appears to contain correct results.
此行失败:
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
错误是 ValueError:在第2级,标签max(2)> =级别(1)的长度。注意:此索引处于不一致状态
索引如何放回一致状态?
推荐答案
跳过 groupby
不需要
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
inject = df.loc[:, pd.IndexSlice[:, :, 'injection']]
dav = amount - amount.shift() - inject.shift().values
#dav.columns.set_levels(['daily_added_value'], level=2, inplace=True)
pd.concat([df, dav], axis=1).sort_index(axis=1).T
注意:我使用 T
来获取容易适合的图片
Note: I used T
to get a picture that would easily fit
似乎有一个错误我n set_levels
,因此不建议使用它。
there appears to be a bug in set_levels
and as such it is not advised to use it.
重命名MultiIndex列的解决方法在DataFrame dav中
def map_level(df, dct, level=2):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
dct = {'amount':'daily_added_value'}
map_level(dav.T, dct, level=2)
这篇关于在Multiindex DataFrame中添加和重命名列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!