pd.Timestamp与np.datetime64:它们可以互换使用吗? [英] pd.Timestamp versus np.datetime64: are they interchangeable for selected uses?

查看:79
本文介绍了pd.Timestamp与np.datetime64:它们可以互换使用吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该问题是由答案

我的问题是:

  1. 它们对于一部分操作是否可以互换?我很感激 DatetimeIndex提供了更多功能,但我只需要切片和索引之类的基本功能.
  2. 可翻译为numpy的操作的结果中是否有任何已记录的差异?

在我的研究中,我发现一些帖子提到并不总是兼容"-但似乎都没有结论性的参考文件/文档,也没有说明为什么/何时通常不兼容.许多其他帖子都使用numpy表示形式而没有评论.

解决方案

在我看来,您应该始终喜欢使用Timestamp-在需要时,它可以轻松地转换回numpy日期时间.

numpy.datetime64本质上是int64的薄包装.它几乎没有日期/时间特定的功能.

pd.Timestampnumpy.datetime64的包装.它具有相同的int64值作为后盾,但支持整个datetime.datetime接口以及有用的特定于熊猫的功能.

这两个的数组内表示是相同的-它是int64的连续数组. pd.Timestamp是一个标量框,使处理单个值更加容易.

回到链接的答案,您可以这样写,它更短,碰巧更快.

%timeit (df.index.values >= pd.Timestamp('2011-01-02').to_datetime64()) & \
        (df.index.values < pd.Timestamp('2011-01-03').to_datetime64())
192 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This question is motivated by an answer to a question on improving performance when performing comparisons with DatetimeIndex in pandas.

The solution converts the DatetimeIndex to a numpy array via df.index.values and compares the array to a np.datetime64 object. This appears to be the most efficient way to retrieve the Boolean array from this comparison.

The feedback on this question from one of the developers of pandas was: "These are not the same generally. Offering up a numpy solution is often a special case and not recommended."

My questions are:

  1. Are they interchangeable for a subset of operations? I appreciate DatetimeIndex offers more functionality, but I require only basic functionality such as slicing and indexing.
  2. Are there any documented differences in result for operations that are translatable to numpy?

In my research, I found some posts which mention "not always compatible" - but none of them seem to have any conclusive references / documentation, or specify why/when generally they are incompatible. Many other posts use the numpy representation without comment.

解决方案

In my opinion, you should always prefer using a Timestamp - it can easily transform back into a numpy datetime in the case it is needed.

numpy.datetime64 is essentially a thin wrapper an int64. It has almost no date/time specific functionality.

pd.Timestamp is a wrapper around a numpy.datetime64. It is backed by the same int64 value, but supports the entire datetime.datetime interface, along with useful pandas-specific functionality.

The in-array representation of these two is identical - it is a contigous array of int64s. pd.Timestamp is a scalar box that makes working with individual values easier.

Going back to the linked answer, you could write it like this, which is shorter and happens to be faster.

%timeit (df.index.values >= pd.Timestamp('2011-01-02').to_datetime64()) & \
        (df.index.values < pd.Timestamp('2011-01-03').to_datetime64())
192 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

这篇关于pd.Timestamp与np.datetime64:它们可以互换使用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆