使用索引名称通过Pandas中的多索引框架选择新的数据框架 [英] Selecting a new dataframe via a multi-indexed frame in Pandas using index names

查看:107
本文介绍了使用索引名称通过Pandas中的多索引框架选择新的数据框架的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下从数据ferame中提取的内容(在底部)具有一个以区域和Quardate为索引的多重索引,我想获得一个具有相同索引的新数据框,并且每个行仅具有最大日期的行.我不知道如何访问数据框索引值. 我想要类似的东西:

The following extract from a dataferame (at the bottom) has a multi-index with region and Quardate as the index, I want to get a new dataframe with the same index and Only the row with the max date per region.. I can't figure out how to access the dataframe index values. I want something like:

most_recent_date=totRegscore.region.Quradate.max()

这只是告诉我'DataFrame'对象没有属性'region' 我也想这样做:

Which just tells me 'DataFrame' object has no attribute 'region' I'll also want to do:

last_quarter = most_recent_date - relativedelta(months=3)

并执行类似的操作:

quarter_score_diff = [most_recent_date, last_quarter].diff()

这是我无法使用的答案的变体... 熊猫数据框的摘要计算 样本输入:

This a variant of an answer here that I can't get working... Summary calculations on a Pandas Dataframe Sample input:

                                                              Score1      Score2  
region                                           Quradate           
North_Central-Birmingham-Tuscaloosa-Anniston 2010-01-15             47           50
                                             2010-04-15             45           60
                                             2010-07-15             45           40

我认为这时我的主要问题是无法使用索引名称选择多索引数据框的特定行. 因此,在获得一个仅包含当前日期的dF以及一个仅包含上一个四分之一日期的dF时,两个数据帧的差异看起来像 样本输出:

I think at this point my main issue is not being able to select the specific rows of the muti-indexed dataframe using index names.... So upon getting one dF with just current date and one with just last quarter's date the diff of the two dataframes would look like Sample output:

                                                                      Score1      Score2  
              region                                        Quradate           
quarterly_diff North_Central-Birmingham-Tuscaloosa-Anniston 2010-07-15     7           6
quarterly_diff Huntsville                                   2010-07-15     6           5

推荐答案

这就是显示生成函数的意思.这将创建像您一样的样本数据,并进行提示和回答;现在您可以轻松地说,嘿,我想要这个(并创建示例输出).

This is what I mean by showing a generating function. This creates sample data like yours, and presents and answer; now it is easy for you to say, hey I want this (and create an example output).

In [40]: df = DataFrame({ 'Score1' : [ 47,45,45,37,35,35], 
                          'Score2' : [50,60,40,50,60,40] },
              index=MultiIndex.from_tuples([ (r,t) for t in date_range('2010-1-1',periods=3,freq='QS')+pd.offsets.Day(14) 
                for r in ['R1','R2'] ], names=['region','date'])).sortlevel()

In [41]: df
Out[41]: 
                   Score1  Score2
region date                      
R1     2010-01-15      47      50
       2010-04-15      45      40
       2010-07-15      35      60
R2     2010-01-15      45      60
       2010-04-15      37      50
       2010-07-15      35      40

我发现这对查看实际传递给应用程序的内容很有帮助,在本例中为框架

I find this helpful to see what is actually passed to the apply, which in this case is a frame

In [5]: def f(x):
   ...:     print x
   ...:     return x
   ...: 

In [6]: df.groupby(level='region').apply(f)
                   Score1  Score2
region date                      
R1     2010-01-15      47      50
       2010-04-15      45      40
       2010-07-15      35      60
                   Score1  Score2
region date                      
R2     2010-01-15      45      60
       2010-04-15      37      50
       2010-07-15      35      40
Out[6]: 
                   Score1  Score2
region date                      
R1     2010-01-15      47      50
       2010-04-15      45      40
       2010-07-15      35      60
R2     2010-01-15      45      60
       2010-04-15      37      50
       2010-07-15      35      40

对于每个区域,请向我显示2个周期前的分栏差异

For each region, show me the column-wise diff from 2 periods ago, among the scores

In [16]: df.groupby(level='region').apply(lambda x: x.diff(2))
Out[16]: 
                   Score1  Score2
region date                      
R1     2010-01-15     NaN     NaN
       2010-04-15     NaN     NaN
       2010-07-15     -12      10
R2     2010-01-15     NaN     NaN
       2010-04-15     NaN     NaN
       2010-07-15     -10     -20

与2个季度之前的价格有所不同,请给我最后一个值

Diff from 2 qtrs ago, just return me the last value

In [17]: df.groupby(level='region').apply(lambda x: x.diff(2).iloc[-1])
Out[17]: 
        Score1  Score2
region                
R1         -12      10
R2         -10     -20

这篇关于使用索引名称通过Pandas中的多索引框架选择新的数据框架的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆