datetime.timestamp在 pandas 应用和数据框选择中返回不同的值 [英] datetime.timestamp returns different values in pandas apply and dataframe selection

查看:42
本文介绍了datetime.timestamp在 pandas 应用和数据框选择中返回不同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅下面的代码演示此问题.创建一个简单的pandas数据框,其中一行和一列包含一个datetime实例.如您所见,在datetime对象上调用 timestamp()会返回 1581894000.0 .通过数据框选择日期时间对象并调用 timestamp()会得到 1581897600.0 .当使用熊猫 apply 函数在日期"列的每一行上调用 datetime.timestamp 时,返回值将变为 1581894000.0 .我希望在所有情况下都能得到相同的 timestamp 值.

See code below demonstrating the issue. A simple pandas dataframe is created with one row and one column containing one datetime instance. As you can see, calling timestamp() on the datetime object returns 1581894000.0. Selecting the datetime object through the dataframe and calling timestamp() gives 1581897600.0. When using pandas apply function to call datetime.timestamp on each row of column 'date', the return value becomes 1581894000.0. I would expect to get the same timestamp value in all situations.

In[19]: d = datetime(2020, 2, 17)
In[20]: d.timestamp()
Out[20]: 1581894000.0 <----------------------------------+
In[21]: df = pd.DataFrame({'date': [d]})                 |
In[22]: df                                               |
Out[22]:                                                 |
        date                                             |
0 2020-02-17                                             |
In[23]: df['date'][0]                                    |
Out[23]: Timestamp('2020-02-17 00:00:00')                |
In[24]: df['date'][0].timestamp()                        |
Out[24]: 1581897600.0 <---------------------- These should be the same
In[25]: df['date'].apply(datetime.timestamp)             |
Out[25]:                                                 | 
0    1.581894e+09                                        |
Name: date, dtype: float64                               |
In[26]: df['date'].apply(datetime.timestamp)[0]          |
Out[26]: 1581894000.0 <----------------------------------+

编辑

感谢@ALollz使用熊猫的 to_datetime Timestamp 进行输入,如下所示似乎可以解决此问题.

Edit

Thanks to input from @ALollz, using to_datetime and Timestamp from pandas, as shown below seems to fix the problem.

In[15]: d = pd.to_datetime(datetime(2020,2,17))
In[16]: d.timestamp()
Out[16]: 1581897600.0
In[17]: df = pd.DataFrame({'date': [d]}) 
In[18]: df
Out[18]: 
        date
0 2020-02-17
In[19]: df['date'][0]
Out[19]: Timestamp('2020-02-17 00:00:00')
In[20]: df['date'][0].timestamp()
Out[20]: 1581897600.0
In[21]: df['date'].apply(pd.Timestamp.timestamp)
Out[21]: 
0    1.581898e+09
Name: date, dtype: float64
In[22]: df['date'].apply(pd.Timestamp.timestamp)[0]
Out[22]: 1581897600.0

推荐答案

问题是时区感知. pandas 并不总是与datetime模块配合使用,并且某些决定与标准库有所不同,在这种情况下,如何处理不认识时区的datetime对象.

The problem is timezone awareness. pandas doesn't always play well with the datetime module and some decisions diverge from the standard library, in this case how to deal with timezone unaware datetime objects.

这个特定问题似乎是基于此内容的设计选择公开问题

This specific issue seems to have been a design choice based upon this open issue

是的,对于纯朴的tz,我们将时间戳当作UTC来实现.除其他外,这确保了无论代码在何处运行,我们都可以得到相同的行为.

Yah, for tz-naive we implement timestamp as if it were UTC. Among other things, this ensures that we get the same behavior regardless of where the code is running.

因此,要获得一致的答案,您需要一个UTC本地化的时区,以便datetime.timestamp使用它而不是计算机的本地时区.

So to get a consistent answer you'd need a UTC localized timezone so that datetime.timestamp used that instead of your machine's local timezone.

from datetime import datetime
import pytz

my_date = datetime(2020, 2, 17)
my_date_aware = pytz.utc.localize(my_date)

# UTC aware is the same as pandas
datetime.timestamp(my_date_aware) - pd.to_datetime(my_date).timestamp()
#0

datetime.timestamp(my_date) - pd.to_datetime(my_date).timestamp()
#18000.0

这篇关于datetime.timestamp在 pandas 应用和数据框选择中返回不同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆