日期范围在 pandas [英] Date ranges in Pandas
问题描述
与NumPy和dateutil战斗了几天后,我最近发现了令人惊叹的Pandas库.我一直在仔细阅读文档和源代码,但无法弄清楚如何获取date_range()
在正确的断点处生成索引.
After fighting with NumPy and dateutil for days, I recently discovered the amazing Pandas library. I've been poring through the documentation and source code, but I can't figure out how to get date_range()
to generate indices at the right breakpoints.
from datetime import date
import pandas as pd
start = date('2012-01-15')
end = date('2012-09-20')
# 'M' is month-end, instead I need same-day-of-month
date_range(start, end, freq='M')
我想要什么:
2012-01-15
2012-02-15
2012-03-15
...
2012-09-15
我得到的东西:
2012-01-31
2012-02-29
2012-03-31
...
2012-08-31
我需要一个月大小的块,这些块占一个月中可变的天数.使用dateutil.rrule可以做到这一点:
I need month-sized chunks that account for the variable number of days in a month. This is possible with dateutil.rrule:
rrule(freq=MONTHLY, dtstart=start, bymonthday=(start.day, -1), bysetpos=1)
丑陋且难以辨认,但有效.我该如何用熊猫呢?到目前为止,我都没有玩过date_range()
和period_range()
.
Ugly and illegible, but it works. How can do I this with pandas? I've played with both date_range()
and period_range()
, so far with no luck.
我的实际目标是使用groupby
,crosstab
和/或resample
根据时段内各个条目的总和/均值/等来计算每个时段的值.换句话说,我要转换以下数据:
My actual goal is to use groupby
, crosstab
and/or resample
to calculate values for each period based on sums/means/etc of individual entries within the period. In other words, I want to transform data from:
total
2012-01-10 00:01 50
2012-01-15 01:01 55
2012-03-11 00:01 60
2012-04-28 00:01 80
#Hypothetical usage
dataframe.resample('total', how='sum', freq='M', start='2012-01-09', end='2012-04-15')
到
total
2012-01-09 105 # Values summed
2012-02-09 0 # Missing from dataframe
2012-03-09 60
2012-04-09 0 # Data past end date, not counted
鉴于Pandas最初是一种财务分析工具,因此我可以肯定有一种简单快捷的方法可以做到这一点.感谢帮助!
Given that Pandas originated as a financial analysis tool, I'm virtually certain that there's a simple and fast way to do this. Help appreciated!
推荐答案
freq='M'
is for month-end frequencies (see here). But you can use .shift
to shift it by any number of days (or any frequency for that matter):
pd.date_range(start, end, freq='M').shift(15, freq=pd.datetools.day)
这篇关于日期范围在 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!