使用多索引处理日期索引时遇到麻烦 [英] Trouble working with date indexes with Multi-Index

查看:71
本文介绍了使用多索引处理日期索引时遇到麻烦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解pandas中与日期相关的索引功能如何工作.

I am trying to understand how the date-related features of indexing in pandas work.

如果我有此数据框:

dates = pd.date_range('6/1/2000', periods=12, freq='M')
df1 = DataFrame(randn(12, 2), index=dates, columns=['A', 'B'])

我知道我们可以使用df1['2000']提取2000年的记录,或者使用df1['2000-09':'2001-03']提取日期范围.

I know that we can extract records from 2000 using df1['2000'] or a range of dates using df1['2000-09':'2001-03'].

但是假设我有一个具有多索引的数据框

But suppose instead I have a dataframe with a multi-index

index = pd.MultiIndex.from_arrays([dates, list('HIJKHIJKHIJK')], names=['date', 'id'])
df2 = DataFrame(randn(12, 2), index=index, columns=['C', 'D'])

是否有办法像对单个索引那样提取2000年的行?看来df2.xs('2000-06-30')可用于访问特定日期,但df2.xs('2000')不会返回任何内容. xs不是正确的解决方法吗?

Is there a way to extract rows with a year 2000 as we did with a single index? It appears that df2.xs('2000-06-30') works for accessing a particular date, but df2.xs('2000') does not return anything. Is xs not the right way to go about this?

推荐答案

您无需为此使用xs,但是您可以使用.loc进行索引.
您尝试过的示例之一将类似于df2.loc['2000-09':'2001-03'].唯一的问题是,使用多索引时,部分字符串解析"功能尚无法使用.因此,您必须提供实际的日期时间:

You don't need to use xs for this, but you can index using .loc.
One of the example you tried, would then look like df2.loc['2000-09':'2001-03']. The only problem is that the 'partial string parsing' feature does not work yet when using multi-index. So you have to provide actual datetimes:

In [17]: df2.loc[pd.Timestamp('2000-09'):pd.Timestamp('2001-04')]
Out[17]:
                      C         D
date       id
2000-09-30 K  -0.441505  0.364074
2000-10-31 H   2.366365 -0.404136
2000-11-30 I   0.371168  1.218779
2000-12-31 J  -0.579180  0.026119
2001-01-31 K   0.450040  1.048433
2001-02-28 H   1.090321  1.676140
2001-03-31 I  -0.272268  0.213227

但是请注意,在这种情况下,pd.Timestamp('2001-03')将被解释为2001-03-01 00:00:00(实际时间).因此,您必须稍微调整启动/停止值.

But note that in this case pd.Timestamp('2001-03') would be interpreted as 2001-03-01 00:00:00(an actual moment in time). Therefore, you have to adjust the start/stop values a little bit.

全年的选择(例如df1['2000'])将变为df2.loc[pd.Timestamp('2000'):pd.Timestamp('2001')]df2.loc[pd.Timestamp('2000-01-01'):pd.Timestamp('2000-12-31')]

A selection for a full year (eg df1['2000']) would then become df2.loc[pd.Timestamp('2000'):pd.Timestamp('2001')] or df2.loc[pd.Timestamp('2000-01-01'):pd.Timestamp('2000-12-31')]

这篇关于使用多索引处理日期索引时遇到麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆