多索引数据帧的 pandas 设计注意事项 [英] Pandas Design Considerations for MultiIndexed Dataframes

查看:57
本文介绍了多索引数据帧的 pandas 设计注意事项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该问题的目的是进一步探索 MultiIndex数据框,并提出有关以下内容的问题:各种任务的最佳方法.

The purpose of this question is to further explore MultiIndex dataframes and to ask questions of the best approach for various tasks.

创建数据框

import pandas as pd

df = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'], 
               'portfolio' : ['A','B','C','D','E'], 
               'reporting_ccy' : ['GBP','GBP','GBP','GBP','GBP'],
               'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
               'amount' : [100,200,300,400,500],
               'injection' : [1,2,3,4,5],
               'to_usd' : [1.3167,1.3167,1.3167,1.3167,1.3167],
               'to_ccy' : [0.009564,1,1,1.1093,1.1093],
               'm5' : [2,4,6,8,10],
               'm6' : [1,3,5,7,9]}); 

枢轴数据框

df_pivot = df.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)

重命名列

df_pivot.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']

这将产生数据的枢轴表示,例如:

This yields a pivoted representation of the data such that:

  1. 投资组合可能有一项或多项措施
  2. 显示投资组合默认货币
  3. 显示投资组合报告货币
  4. 一种度量可能具有一种或多种报告货币.

我用4.的术语来说,鉴于我们拥有货币的xRate,最好的实施方法是什么?

I terms of 4. what is the best approach for implementation given that we have the xRates for the currencies?

我们创建了一个数据帧,例如从这里派生的数据帧:

Such that we create a dataframe such as that derived here:

创建DataFrame

df1 = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'], 
           'portfolio' : ['A','B','C','D','E'], 
           'reporting_ccy' : ['JPY','USD','USD','EUR','EUR'],
           'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
           'amount' : [13767.2522, 263.34, 395.01, 474.785901, 593.4823763],
           'injection' : [1,2,3,4,5],
           'to_usd' : [0.009564, 1, 1, 1.1093, 1.1093],
           'to_ccy' : [1.3167, 1.3167, 1.3167, 1.3167, 1.3167],
           'm5' : [2,4,6,8,10],
           'm6' : [1,3,5,7,9]}); 

连接并连接旋转数据框

df_concat = pd.concat([df,df1])
df_pivot1 = df_concat.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)
df_pivot1.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']

现在显示1个小数具有多种货币.

This now shows 1 measure having many currencies.

df_pivot1.xs(('amount', 'A'), level=('measures','portfolio'), drop_level=False, axis=1)

问题

是否有更好的方法,例如将数据直接添加到级别3 df_pivot1.columns.get_level_values(3).unique()的multiIndexed数据帧中?

Is there a better way, such as adding data directly to a multiIndexed dataframe at level 3 df_pivot1.columns.get_level_values(3).unique()?

我希望能够遍历每个级别并添加使用df.assign()或其他方法从其他度量派生的新度量.

I would like to be able to iterate through each level and add new measures either derived from other measures using df.assign() or other methods.

这里的用例是在适用时将其他货币添加到度量中.上面的级联和重新透视似乎不是最佳的.

The use case here is to add other currencies to the measures where applicable. The concatenation and re-pivot as above does not seem optimal.

推荐答案

我对信息过载感到非常困惑.
但是,如果我理解正确:

I'm very confused by the information overload.
However, if I understand correctly:

我的意思是,没有一种简单的方法可以在Multi-Index数据框中添加较低的级别.

What I am implying is that there is not an easy way of adding to a lower level in a Multi-Index data frame.


考虑df

df = pd.DataFrame(np.arange(64).reshape(-1, 8), list('abcdefgh'), list('ABCDEFGH'))
df

我们可以轻松地将一个级别添加到索引的内部级别

we can easily add a level to the interior level of the index

df.index = [df.index, list('XY') * 4]
df

这篇关于多索引数据帧的 pandas 设计注意事项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆