根据涵盖多个月的年度回报期,对Pandas DataFrame进行分组 [英] Subset Pandas DataFrame based on annual returning period covering multiple months

查看:121
本文介绍了根据涵盖多个月的年度回报期,对Pandas DataFrame进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题类似于根据月份和月份选择Pandas DataFrame记录很多年.日期范围,但问题和答案似乎都无法解决我的情况

This question is similar to Selecting Pandas DataFrame records for many years based on month & day range, but both the question and answer doesn't seem to cover my case

import pandas as pd
import numpy as np

rng = pd.date_range('2010-1-1', periods=1000, freq='D')
df = pd.DataFrame(np.random.randn(len(rng)), index=rng, columns=['A'])
df.head()

                   A
2010-01-01  1.098302
2010-01-02 -1.384821
2010-01-03 -0.426329
2010-01-04 -0.587967
2010-01-05 -0.853374

现在,我想根据每年的年度回报期对DataFrame进行子集化. 例如,可以将时间段定义为2月15日至10月3日

Now I would like to subset my DataFrame based on an annual returning period for every year. A period can for example be defined as from February 15th to October 3rd

startMM, startdd = (2,15)
endMM, enddd = (10,3)

现在,我尝试根据此时间段对多年的DataFrame进行切片:

Now I tried to to slice my multi-year DataFrame based on this period:

subset = df[((df.index.month == startMM) & (startdd <= df.index.day) 
             | (df.index.month == endMM) & (df.index.day <= enddd))]

,但这仅返回startMMendMM中定义的月份,而不返回日期之间的实际时间段.感谢您的帮助.

but this returns only the months as is defined in the startMM and endMM and not the actual period between the dates. Any help is kindly appreciated.

subset.index.month.unique()

Int64Index([2, 10], dtype='int64')

推荐答案

我将创建一列(month, day)元组:

month_day = pd.concat([
                df.index.to_series().dt.month, 
                df.index.to_series().dt.day
            ], axis=1).apply(tuple, axis=1)

然后您可以直接比较它们:

You can then compare them directly:

df[(month_day >= (startMM, startdd)) & (month_day <= (endMM, enddd))]

这篇关于根据涵盖多个月的年度回报期,对Pandas DataFrame进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆