pandas 在DatetimeIndex和Timestamp之间的工作日数 [英] Pandas number of business days between a DatetimeIndex and a Timestamp

查看:104
本文介绍了 pandas 在DatetimeIndex和Timestamp之间的工作日数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与此处但我想知道熊猫中是否有一种干净的方法可以使工作日知道TimedeltaIndex?最终,我试图获取DatetimeIndex和Timestamp之间的工作日数(无假期日历).根据所引用的问题,类似这样的作品

This is quite similar to the question here but I'm wondering if there is a clean way in pandas to make a business day aware TimedeltaIndex? Ultimately I am trying to get the number of business days (no holiday calendar) between a DatetimeIndex and a Timestamp. As per the referenced question, something like this works

import pandas as pd
import numpy as np
drg = pd.date_range('2015-07-31', '2015-08-05', freq='B')
A = [d.date() for d in drg]
B = pd.Timestamp('2015-08-05', 'B').date()
np.busday_count(A, B)

给出

array([3, 2, 1, 0], dtype=int64)

但是这似乎有点糊涂.如果我尝试类似

but this seems a bit kludgy. If I try something like

drg - pd.Timestamp('2015-08-05', 'B')

我得到一个TimedeltaIndex,但是工作日频率降低了

I get a TimedeltaIndex but the business day frequency is dropped

TimedeltaIndex(['-5 days', '-2 days', '-1 days', '0 days'], dtype='timedelta64[ns]', freq=None)

只是想知道是否还有一种更优雅的方式来解决这个问题.

Just wondering if there is a more elegant way to go about this.

推荐答案

TimedeltaIndex es表示固定的时间跨度.可以将它们添加到Pandas时间戳中,以固定数量增加它们.他们的行为从不依赖于时间戳是否是一个工作日. TimedeltaIndex本身从不知道工作日.

TimedeltaIndexes represent fixed spans of time. They can be added to Pandas Timestamps to increment them by fixed amounts. Their behavior is never dependent on whether or not the Timestamp is a business day. The TimedeltaIndex itself is never business-day aware.

由于最终目的是计算DatetimeIndex和Timestamp之间的天数,所以我会朝着转换为TimedeltaIndex的另一方向寻找.

Since the ultimate goal is to count the number of days between a DatetimeIndex and a Timestamp, I would look in another direction than conversion to TimedeltaIndex.

不幸的是,日期计算相当复杂,并且涌现了许多数据结构来处理它们-Python datetime.datesdatetime.datetime s,Pandas Timestamps,NumPy datetime64.

Unfortunately, date calculations are rather complicated, and a number of data structures have sprung up to deal with them -- Python datetime.dates, datetime.datetimes, Pandas Timestamps, NumPy datetime64s.

他们每个人都有自己的长处,但是没有一个人能在所有方面都有好处.到 利用他们的优势,有时需要在 这些类型.

They each have their strengths, but no one of them is good for all purposes. To take advantage of their strengths, it is sometime necessary to convert between these types.

要使用np.busday_count,您需要将DatetimeIndex和Timestamp转换为 某些np.busday_count类型可以理解.你所谓的"kludginess"就是代码 需要转换类型.假设我们要使用np.busday_count,这是没有办法的-我知道没有比np.busday_count更好的工具了.

To use np.busday_count you need to convert the DatetimeIndex and Timestamp to some type np.busday_count understands. What you call kludginess is the code required to convert types. There is no way around that assuming we want to use np.busday_count -- and I know of no better tool for this job than np.busday_count.

因此,尽管我认为没有一种更简洁的方式来计算工作日 比您建议的方法要有效得多: 转换为datetime64[D]而不是Python datetime.date对象:

So, although I don't think there is a more succinct way to count business days than than the method you propose, there is a far more performant way: Convert to datetime64[D]'s instead of Python datetime.date objects:

import pandas as pd
import numpy as np
drg = pd.date_range('2000-07-31', '2015-08-05', freq='B')
timestamp = pd.Timestamp('2015-08-05', 'B')

def using_astype(drg, timestamp):
    A = drg.values.astype('<M8[D]')
    B = timestamp.asm8.astype('<M8[D]')
    return np.busday_count(A, B)

def using_datetimes(drg, timestamp):
    A = [d.date() for d in drg]
    B = pd.Timestamp('2015-08-05', 'B').date()
    return np.busday_count(A, B)


对于上面的示例(其中len(drg)接近4000),这快100倍以上:


This is over 100x faster for the example above (where len(drg) is close to 4000):

In [88]: %timeit using_astype(drg, timestamp)
10000 loops, best of 3: 95.4 µs per loop

In [89]: %timeit using_datetimes(drg, timestamp)
100 loops, best of 3: 10.3 ms per loop

np.busday_count始终将其输入转换为datetime64[D],因此避免在datetime.date之间来回进行额外转换会更加有效.

np.busday_count converts its input to datetime64[D]s anyway, so avoiding this extra conversion to and from datetime.dates is far more efficient.

这篇关于 pandas 在DatetimeIndex和Timestamp之间的工作日数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆