pandas date_range-减去numpy timedelta得出奇怪的结果,时间不是0:00:00 [英] Pandas date_range - subtracting numpy timedelta gives odd result, time becomes not 0:00:00

查看:69
本文介绍了 pandas date_range-减去numpy timedelta得出奇怪的结果,时间不是0:00:00的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用pandas date_range功能生成一组日期.然后,我要在此范围内进行迭代,并从每个日期中减去几个月(确切的月数由循环确定)以获取新日期.

当我这样做时,会得到一些非常奇怪的结果.

MVP:

#get date range
dates = pd.date_range(start = '1/1/2013', end='1/1/2018', freq=str(test_size)+'MS', closed='left', normalize=True)
#take first date as example
date = dates[0]
date
Timestamp('2013-01-01 00:00:00', freq='3MS')

到目前为止一切都很好.

现在让我们说我想从这个日期起只剩一个月了.我定义了numpy timedelta(它支持数月的定义,而熊猫的timedelta不支持):

#get timedelta of 1 month
deltaGap = np.timedelta64(1,'M')
#subtract one month from date
date - deltaGap
Timestamp('2012-12-01 13:30:54', freq='3MS')

为什么呢?为什么我在时间部分而不是午夜得到13:30:54.

此外,如果我减去1个月以上,则变化会变得很大,以至于我整天失去了力量:

#let's say I want to subtract both 2 years and then 1 month
deltaTrain = np.timedelta64(2,'Y')
#subtract 2 years and then subtract 1 month 
date - deltaTrain - deltaGap
Timestamp('2010-12-02 01:52:30', freq='3MS')

解决方案

我在timedelta上也遇到了类似的问题,而我最终使用的解决方案是使用dateutil中的relativedelta,具体是为此类应用程序而构建(考虑到所有日历怪异性,例如leap年,工作日等).例如:

from dateutil.relativedelta import relativedelta

date = dates[0]

>>> date
Timestamp('2013-01-01 00:00:00', freq='10MS')

deltaGap = relativedelta(months=1)

>>> date-deltaGap
Timestamp('2012-12-01 00:00:00', freq='10MS')

deltaGap = relativedelta(years=2, months=1)

>>> date-deltaGap
Timestamp('2010-12-01 00:00:00', freq='10MS')

查看文档以获得有关relativedelta的更多信息

numpy.timedelta64

的问题

我认为np.timedelta的问题已在

跨度的长度是64位整数乘以日期或单位长度的范围.例如,"W"(周)的时间跨度比"D"(天)的时间跨度长7倍,而"D"(天)的时间跨度比时间跨度长24倍.表示"h"(小时).

因此,时间增量适用于数小时,数周,数月,数天,因为这是不可更改的时间跨度.但是,月份和年份的长度是可变的(请考虑leap年),因此,考虑到这一点,numpy需要某种平均值"(我想).一个numpy年"似乎是一年5小时49分12秒,而一个numpy月"似乎是30天10小时29分6秒.

# Adding one numpy month adds 30 days + 10:29:06:
deltaGap = np.timedelta64(1,'M')
date+deltaGap
# Timestamp('2013-01-31 10:29:06', freq='10MS')

# Adding one numpy year adds 1 year + 05:49:12:
deltaGap = np.timedelta64(1,'Y')
date+deltaGap
# Timestamp('2014-01-01 05:49:12', freq='10MS')

使用它并不是那么容易,这就是为什么我只想去relativedelta的原因(对我来说,它更直观).

I am trying to generate a set of dates with pandas date_range functionality. Then I want to iterate over this range and subtract several months from each of the dates (exact number of month is determined in loop) to get a new date.

I get some very odd results when I do this.

MVP:

#get date range
dates = pd.date_range(start = '1/1/2013', end='1/1/2018', freq=str(test_size)+'MS', closed='left', normalize=True)
#take first date as example
date = dates[0]
date
Timestamp('2013-01-01 00:00:00', freq='3MS')

So far so good.

Now let's say I want to go just one month back from this date. I define numpy timedelta (it supports months for definition, while pandas' timedelta doesn't):

#get timedelta of 1 month
deltaGap = np.timedelta64(1,'M')
#subtract one month from date
date - deltaGap
Timestamp('2012-12-01 13:30:54', freq='3MS')

Why so? Why I get 13:30:54 in time component instead of midnight.

Moreover, if I subtract more than 1 month it the shift becomes so large that I lose a whole day:

#let's say I want to subtract both 2 years and then 1 month
deltaTrain = np.timedelta64(2,'Y')
#subtract 2 years and then subtract 1 month 
date - deltaTrain - deltaGap
Timestamp('2010-12-02 01:52:30', freq='3MS')

解决方案

I've had similar issues with timedelta, and the solution I've ended up using was using relativedelta from dateutil, which is specifically built for this kind of application (taking into account all the calendar weirdness like leap years, weekdays, etc...). For example given:

from dateutil.relativedelta import relativedelta

date = dates[0]

>>> date
Timestamp('2013-01-01 00:00:00', freq='10MS')

deltaGap = relativedelta(months=1)

>>> date-deltaGap
Timestamp('2012-12-01 00:00:00', freq='10MS')

deltaGap = relativedelta(years=2, months=1)

>>> date-deltaGap
Timestamp('2010-12-01 00:00:00', freq='10MS')

Check out the documentation for more info on relativedelta

The issues with numpy.timedelta64

I think that the problem with np.timedelta is revealed in these 2 parts of the docs:

There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.

and

The length of the span is the range of a 64-bit integer times the length of the date or unit. For example, the time span for ‘W’ (week) is exactly 7 times longer than the time span for ‘D’ (day), and the time span for ‘D’ (day) is exactly 24 times longer than the time span for ‘h’ (hour).

So the timedeltas are fine for hours, weeks, months, days, because these are non-variable timespans. However, months and years are variable in length (think leap years), and so to take this into account, numpy takes some sort of "average" (I guess). One numpy "year" seems to be one year, 5 hours, 49 minutes and 12 seconds, while one numpy "month" seems to be 30 days, 10 hours, 29 minutes and 6 seconds.

# Adding one numpy month adds 30 days + 10:29:06:
deltaGap = np.timedelta64(1,'M')
date+deltaGap
# Timestamp('2013-01-31 10:29:06', freq='10MS')

# Adding one numpy year adds 1 year + 05:49:12:
deltaGap = np.timedelta64(1,'Y')
date+deltaGap
# Timestamp('2014-01-01 05:49:12', freq='10MS')

This is not so easy to work with, which is why I would just go to relativedelta, which is much more intuitive (to me).

这篇关于 pandas date_range-减去numpy timedelta得出奇怪的结果,时间不是0:00:00的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆