datetime.timestamp在 pandas 应用和数据框选择中返回不同的值 [英] datetime.timestamp returns different values in pandas apply and dataframe selection
问题描述
请参阅下面的代码演示此问题.创建一个简单的pandas数据框,其中一行和一列包含一个datetime实例.如您所见,在datetime对象上调用 timestamp()
会返回 1581894000.0
.通过数据框选择日期时间对象并调用 timestamp()
会得到 1581897600.0
.当使用熊猫 apply
函数在日期"列的每一行上调用 datetime.timestamp
时,返回值将变为 1581894000.0
.我希望在所有情况下都能得到相同的 timestamp
值.
See code below demonstrating the issue. A simple pandas dataframe is created with one row and one column containing one datetime instance. As you can see, calling timestamp()
on the datetime object returns 1581894000.0
. Selecting the datetime object through the dataframe and calling timestamp()
gives 1581897600.0
. When using pandas apply
function to call datetime.timestamp
on each row of column 'date', the return value becomes 1581894000.0
. I would expect to get the same timestamp
value in all situations.
In[19]: d = datetime(2020, 2, 17)
In[20]: d.timestamp()
Out[20]: 1581894000.0 <----------------------------------+
In[21]: df = pd.DataFrame({'date': [d]}) |
In[22]: df |
Out[22]: |
date |
0 2020-02-17 |
In[23]: df['date'][0] |
Out[23]: Timestamp('2020-02-17 00:00:00') |
In[24]: df['date'][0].timestamp() |
Out[24]: 1581897600.0 <---------------------- These should be the same
In[25]: df['date'].apply(datetime.timestamp) |
Out[25]: |
0 1.581894e+09 |
Name: date, dtype: float64 |
In[26]: df['date'].apply(datetime.timestamp)[0] |
Out[26]: 1581894000.0 <----------------------------------+
编辑
感谢@ALollz使用熊猫的 to_datetime
和 Timestamp
进行输入,如下所示似乎可以解决此问题.
Edit
Thanks to input from @ALollz, using to_datetime
and Timestamp
from pandas, as shown below seems to fix the problem.
In[15]: d = pd.to_datetime(datetime(2020,2,17))
In[16]: d.timestamp()
Out[16]: 1581897600.0
In[17]: df = pd.DataFrame({'date': [d]})
In[18]: df
Out[18]:
date
0 2020-02-17
In[19]: df['date'][0]
Out[19]: Timestamp('2020-02-17 00:00:00')
In[20]: df['date'][0].timestamp()
Out[20]: 1581897600.0
In[21]: df['date'].apply(pd.Timestamp.timestamp)
Out[21]:
0 1.581898e+09
Name: date, dtype: float64
In[22]: df['date'].apply(pd.Timestamp.timestamp)[0]
Out[22]: 1581897600.0
推荐答案
问题是时区感知. pandas
并不总是与datetime模块配合使用,并且某些决定与标准库有所不同,在这种情况下,如何处理不认识时区的datetime对象.
The problem is timezone awareness. pandas
doesn't always play well with the datetime module and some decisions diverge from the standard library, in this case how to deal with timezone unaware datetime objects.
这个特定问题似乎是基于此内容的设计选择公开问题
This specific issue seems to have been a design choice based upon this open issue
是的,对于纯朴的tz,我们将时间戳当作UTC来实现.除其他外,这确保了无论代码在何处运行,我们都可以得到相同的行为.
Yah, for tz-naive we implement timestamp as if it were UTC. Among other things, this ensures that we get the same behavior regardless of where the code is running.
因此,要获得一致的答案,您需要一个UTC本地化的时区,以便datetime.timestamp使用它而不是计算机的本地时区.
So to get a consistent answer you'd need a UTC localized timezone so that datetime.timestamp used that instead of your machine's local timezone.
from datetime import datetime
import pytz
my_date = datetime(2020, 2, 17)
my_date_aware = pytz.utc.localize(my_date)
# UTC aware is the same as pandas
datetime.timestamp(my_date_aware) - pd.to_datetime(my_date).timestamp()
#0
datetime.timestamp(my_date) - pd.to_datetime(my_date).timestamp()
#18000.0
这篇关于datetime.timestamp在 pandas 应用和数据框选择中返回不同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!