pd.Timestamp 与 np.datetime64:它们是否可以互换用于特定用途? [英] pd.Timestamp versus np.datetime64: are they interchangeable for selected uses?

查看:21
本文介绍了pd.Timestamp 与 np.datetime64:它们是否可以互换用于特定用途?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题的动机是在与 pandasDatetimeIndex 进行比较时提高性能的问题>.

This question is motivated by an answer to a question on improving performance when performing comparisons with DatetimeIndex in pandas.

该解决方案通过 df.index.valuesDatetimeIndex 转换为 numpy 数组,并将该数组与 np 进行比较.datetime64 对象.这似乎是从此比较中检索布尔数组的最有效方法.

The solution converts the DatetimeIndex to a numpy array via df.index.values and compares the array to a np.datetime64 object. This appears to be the most efficient way to retrieve the Boolean array from this comparison.

pandas 的一位开发人员对这个问题的反馈是:这些通常不一样.提供 numpy 解决方案通常是一种特殊情况,不推荐."

The feedback on this question from one of the developers of pandas was: "These are not the same generally. Offering up a numpy solution is often a special case and not recommended."

我的问题是:

  1. 对于操作的子集,它们是否可以互换?我很欣赏DatetimeIndex 提供了更多功能,但我只需要基本功能,例如切片和索引.
  2. 对于可转换为 numpy 的操作,在 result 中是否有任何记录在案的差异?
  1. Are they interchangeable for a subset of operations? I appreciate DatetimeIndex offers more functionality, but I require only basic functionality such as slicing and indexing.
  2. Are there any documented differences in result for operations that are translatable to numpy?

在我的研究中,我发现了一些提到并不总是兼容"的帖子——但它们似乎都没有任何确凿的参考资料/文档,或者说明它们通常不兼容的原因/时间.许多其他帖子使用 numpy 表示,没有评论.

In my research, I found some posts which mention "not always compatible" - but none of them seem to have any conclusive references / documentation, or specify why/when generally they are incompatible. Many other posts use the numpy representation without comment.

推荐答案

在我看来,你应该总是更喜欢使用 Timestamp - 它可以很容易地转换回 numpy datetime,如果它是需要.

In my opinion, you should always prefer using a Timestamp - it can easily transform back into a numpy datetime in the case it is needed.

numpy.datetime64 本质上是 int64 的一个瘦包装器.它几乎没有特定于日期/时间的功能.

numpy.datetime64 is essentially a thin wrapper for int64. It has almost no date/time specific functionality.

pd.Timestampnumpy.datetime64 的包装器.它由相同的 int64 值支持,但支持整个 datetime.datetime 接口,以及有用的 Pandas 特定功能.

pd.Timestamp is a wrapper around a numpy.datetime64. It is backed by the same int64 value, but supports the entire datetime.datetime interface, along with useful pandas-specific functionality.

这两者的数组内表示是相同的——它是一个连续的 int64 数组.pd.Timestamp 是一个标量框,可以更轻松地处理单个值.

The in-array representation of these two is identical - it is a contigous array of int64s. pd.Timestamp is a scalar box that makes working with individual values easier.

回到链接的答案,你可以这样写,它更短而且速度更快.

Going back to the linked answer, you could write it like this, which is shorter and happens to be faster.

%timeit (df.index.values >= pd.Timestamp('2011-01-02').to_datetime64()) & \
        (df.index.values < pd.Timestamp('2011-01-03').to_datetime64())
192 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

这篇关于pd.Timestamp 与 np.datetime64:它们是否可以互换用于特定用途?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆