给定日期范围,如何将其划分为N个连续的子间隔? [英] Given a date range how can we break it up into N contiguous sub-intervals?
问题描述
我正在通过API访问一些数据,我需要为我的请求提供日期范围。开始= 20100101,结束= 20150415。我以为可以通过将日期范围划分为多个不重叠的间隔并在每个间隔上使用多重处理来加快速度。
I am accessing some data through an API where I need to provide the date range for my request, ex. start='20100101', end='20150415'. I thought I would speed this up by breaking up the date range into non-overlapping intervals and use multiprocessing on each interval.
我的问题是,我如何打破日期范围并不能始终如一地获得预期的结果。这是我所做的:
My problem is that how I am breaking up the date range is not consistently giving me the expected result. Here is what I have done:
from datetime import date
begin = '20100101'
end = '20101231'
假设我们想将其分解为几个部分。首先,我将字符串更改为日期:
Suppose we wanted to break this up into quarters. First I change the string into dates:
def get_yyyy_mm_dd(yyyymmdd):
# given string 'yyyymmdd' return (yyyy, mm, dd)
year = yyyymmdd[0:4]
month = yyyymmdd[4:6]
day = yyyymmdd[6:]
return int(year), int(month), int(day)
y1, m1, d1 = get_yyyy_mm_dd(begin)
d1 = date(y1, m1, d1)
y2, m2, d2 = get_yyyy_mm_dd(end)
d2 = date(y2, m2, d2)
然后将范围划分为子间隔:
Then divide this range into sub-intervals:
def remove_tack(dates_list):
# given a list of dates in form YYYY-MM-DD return a list of strings in form 'YYYYMMDD'
tackless = []
for d in dates_list:
s = str(d)
tackless.append(s[0:4]+s[5:7]+s[8:])
return tackless
def divide_date(date1, date2, intervals):
dates = [date1]
for i in range(0, intervals):
dates.append(dates[i] + (date2 - date1)/intervals)
return remove_tack(dates)
使用从上开始和结束,我们得到:
Using begin and end from above we get:
listdates = divide_date(d1, d2, 4)
print listdates # ['20100101', '20100402', '20100702', '20101001', '20101231'] looks correct
但是如果我改用日期,则使用:
But if instead I use the dates:
begin = '20150101'
end = '20150228'
...
listdates = divide_date(d1, d2, 4)
print listdates # ['20150101', '20150115', '20150129', '20150212', '20150226']
我在2月底缺少了两天。我不需要我的应用程序的时间或时区,也不介意安装其他库。
I am missing two days at the end of February. I don't need time or timezone for my application and I don't mind installing another library.
推荐答案
我实际上会关注一种不同的方法,并依赖于时间增量和日期添加来确定非重叠范围
I would actually follow a different approach and rely on timedelta and date addition to determine the non-overlapping ranges
实施
def date_range(start, end, intv):
from datetime import datetime
start = datetime.strptime(start,"%Y%m%d")
end = datetime.strptime(end,"%Y%m%d")
diff = (end - start ) / intv
for i in range(intv):
yield (start + diff * i).strftime("%Y%m%d")
yield end.strftime("%Y%m%d")
执行
>>> begin = '20150101'
>>> end = '20150228'
>>> list(date_range(begin, end, 4))
['20150101', '20150115', '20150130', '20150213', '20150228']
这篇关于给定日期范围,如何将其划分为N个连续的子间隔?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!