在Multiindex DataFrame中添加和重命名列 [英] Adding and Renaming a Column in a Multiindex DataFrame

查看:1795
本文介绍了在Multiindex DataFrame中添加和重命名列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这篇文章的目的是了解如何使用 apply() MultiIndex.DataFrame 中的级别添加列c $ c>和 shift()

The purpose of this post is to understand how to add a column to a level in a MultiIndex.DataFrame using apply() and shift()

创建DataFrame

import pandas as pd

df = pd.DataFrame(
[
    [5777, 100, 5385, 200, 5419, 4887, 100, 200],
    [4849, 0, 4539, 0, 3381, 0, 0, ],
    [4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
    [4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
    [5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
                      '2010-01-02',
                      '2010-01-03',
                      '2010-01-04',
                      '2010-01-05']),
columns=pd.MultiIndex.from_tuples(
    [('Portfolio A', 'GBP', 'amount'), ('Portfolio A', 'GBP', 'injection'),
     ('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
     ('Portfolio A', 'USD', 'amount'), ('Portfolio A', 'USD', 'injection'),
     ('Portfolio B', 'JPY', 'amount'), ('Portfolio B', 'JPY', 'injection')])
).sortlevel(axis=1)

print df

我想使用以下方法向每种货币添加一个新列级别2命名为daily_added_value:

I would like to use the following method to add a new column to each currency at level 2 named daily_added_value:

def do_nothing(group):
   return group

def calc_daily_added_value(group):
    g = (group['amount'] - group['amount'].shift(periods=1, freq=None, axis=0)
          -df['injection'].shift(periods=1, freq=None, axis=0)).round(decimals=2)
    g.index = ['daily_added_value']
    return g

pd.concat([df.T.groupby(level=0).apply(f).T for f in [calc_daily_added_value,do_nothing ]], axis=1).sort_index(axis=1)

但是这会抛出一个ke y错误: KeyError:'amount'

However this throws a key error: KeyError: 'amount'

方法的正确语法是什么 calc_daily_added_value()

What is the correct syntax for the method calc_daily_added_value()?

从下面的答案中还有一个发行

添加每日收益

dav = df.loc[:, pd.IndexSlice[:, :, 'daily_added_value']]
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
dr = (dav.values / amount.shift()) * 100
dr.columns.set_levels(['daily_return'], level=2, inplace=True)
df = pd.concat([df, dr], axis=1).sortlevel(axis=1)

添加累积复合收益FAILS

dr = df.loc[:, pd.IndexSlice[:, :, 'daily_return']]
drc = 100*((1+dr / 100).cumprod()-1)
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
df = pd.concat([df, drc], axis=1).sort_index(axis=1)
df.head()

值,但是如果我添加它成为一个数组?

this fails because it is missing the .values, but if I add this it becomes an array?

这里奇怪的是,drc实际上是一个正确形状的DataFrame等,似乎包含正确的

What is strange here though is that drc is in fact a DataFrame of correct shaped etc. and appears to contain correct results.

此行失败:

drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)

错误是 ValueError:在第2级,标签max(2)> =级别(1)的长度。注意:此索引处于不一致状态

索引如何放回一致状态?

推荐答案

跳过 groupby 不需要

amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
inject = df.loc[:, pd.IndexSlice[:, :, 'injection']]
dav = amount - amount.shift() - inject.shift().values
#dav.columns.set_levels(['daily_added_value'], level=2, inplace=True)

pd.concat([df, dav], axis=1).sort_index(axis=1).T



注意:我使用 T 来获取容易适合的图片



Note: I used T to get a picture that would easily fit

似乎有一个错误我n set_levels ,因此不建议使用它。

there appears to be a bug in set_levels and as such it is not advised to use it.

重命名MultiIndex列的解决方法在DataFrame dav中

def map_level(df, dct, level=2):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else    names
                       for i, names in enumerate(index.levels)], inplace=True)
dct = {'amount':'daily_added_value'}
map_level(dav.T, dct, level=2)

这篇关于在Multiindex DataFrame中添加和重命名列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆