pd.Timestamp 与 np.datetime64:它们是否可以互换用于特定用途? [英] pd.Timestamp versus np.datetime64: are they interchangeable for selected uses?
问题描述
这个问题的动机是对在与 pandas
DatetimeIndex 进行比较时提高性能的问题>.
This question is motivated by an answer to a question on improving performance when performing comparisons with DatetimeIndex
in pandas
.
该解决方案通过 df.index.values
将 DatetimeIndex
转换为 numpy
数组,并将该数组与 np 进行比较.datetime64
对象.这似乎是从此比较中检索布尔数组的最有效方法.
The solution converts the DatetimeIndex
to a numpy
array via df.index.values
and compares the array to a np.datetime64
object. This appears to be the most efficient way to retrieve the Boolean array from this comparison.
pandas
的一位开发人员对这个问题的反馈是:这些通常不一样.提供 numpy 解决方案通常是一种特殊情况,不推荐."
The feedback on this question from one of the developers of pandas
was: "These are not the same generally. Offering up a numpy solution is often a special case and not recommended."
我的问题是:
- 对于操作的子集,它们是否可以互换?我很欣赏
DatetimeIndex
提供了更多功能,但我只需要基本功能,例如切片和索引. - 对于可转换为
numpy
的操作,在 result 中是否有任何记录在案的差异?
- Are they interchangeable for a subset of operations? I appreciate
DatetimeIndex
offers more functionality, but I require only basic functionality such as slicing and indexing. - Are there any documented differences in result for operations that are translatable to
numpy
?
在我的研究中,我发现了一些提到并不总是兼容"的帖子——但它们似乎都没有任何确凿的参考资料/文档,或者说明它们通常不兼容的原因/时间.许多其他帖子使用 numpy
表示,没有评论.
In my research, I found some posts which mention "not always compatible" - but none of them seem to have any conclusive references / documentation, or specify why/when generally they are incompatible. Many other posts use the numpy
representation without comment.
推荐答案
在我看来,你应该总是更喜欢使用 Timestamp
- 它可以很容易地转换回 numpy datetime,如果它是需要.
In my opinion, you should always prefer using a Timestamp
- it can easily transform back into a numpy datetime in the case it is needed.
numpy.datetime64
本质上是 int64
的一个瘦包装器.它几乎没有特定于日期/时间的功能.
numpy.datetime64
is essentially a thin wrapper for int64
. It has almost no date/time specific functionality.
pd.Timestamp
是 numpy.datetime64
的包装器.它由相同的 int64 值支持,但支持整个 datetime.datetime
接口,以及有用的 Pandas 特定功能.
pd.Timestamp
is a wrapper around a numpy.datetime64
. It is backed by the same int64 value, but supports the entire datetime.datetime
interface, along with useful pandas-specific functionality.
这两者的数组内表示是相同的——它是一个连续的 int64 数组.pd.Timestamp
是一个标量框,可以更轻松地处理单个值.
The in-array representation of these two is identical - it is a contigous array of int64s. pd.Timestamp
is a scalar box that makes working with individual values easier.
回到链接的答案,你可以这样写,它更短而且速度更快.
Going back to the linked answer, you could write it like this, which is shorter and happens to be faster.
%timeit (df.index.values >= pd.Timestamp('2011-01-02').to_datetime64()) & \
(df.index.values < pd.Timestamp('2011-01-03').to_datetime64())
192 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
这篇关于pd.Timestamp 与 np.datetime64:它们是否可以互换用于特定用途?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!