Pandas `.to_pydatetime()` 在 DataFrame 中不起作用 [英] Pandas `.to_pydatetime()` not working inside a DataFrame
问题描述
我有像 '03-21-2019'
这样的字符串,我想将其转换为原生 Python 日期时间对象:即 datetime.datetime
类型.通过 pandas
转换很容易:
I have strings like '03-21-2019'
that I want to convert to the native Python datetime object: that is, of the datetime.datetime
type. The conversion is easy enough through pandas
:
import pandas as pd
import datetime as dt
date_str = '03-21-2019'
pd_Timestamp = pd.to_datetime(date_str)
py_datetime_object = pd_Timestamp.to_pydatetime()
print(type(py_datetime_object))
结果
<class 'datetime.datetime'>
这正是我想要的,因为我想通过从另一个中减去其中一个来计算 timedelta
- 在本机 Python datetime.datetime
班级.但是,我的数据在 pd.DataFrame
中.当我尝试以下代码时:
This is precisely what I want, since I want to compute timedelta
's by subtracting one of these from another - perfectly well-defined in the native Python datetime.datetime
class. However, my data is in a pd.DataFrame
. When I try the following code:
import pandas as pd
import datetime as dt
df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(lambda x:
pd.to_datetime(x).to_pydatetime())
print(type(df['Date'].iloc[0]))
结果是
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
这是 WRONG 类型,我终生无法弄清楚为什么只有部分 lambda
表达式是得到评估(即字符串到熊猫时间戳),而不是最后一部分(即熊猫时间戳到日期时间.datetime).如果我显式定义函数,而不是使用 lambda
表达式,它也不起作用:
This is the WRONG type, and I can't for the life of me figure out why only part of the lambda
expression is getting evaluated (that is, string-to-pandas-Timestamp), and not the last part (that is, pandas-Timestamp-to-datetime.datetime). It doesn't work if I define the function explicitly, either, instead of using a lambda
expression:
import pandas as pd
import datetime as dt
def to_native_datetime(date_str: str) -> dt.datetime:
return pd.to_datetime(date_str).to_pydatetime()
df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))
结果和之前一样.它肯定在执行函数的一部分,因为结果不再是字符串.但我想要本机 Python datetime.datetime
对象,我看不到它.这看起来像是 pandas
中的一个错误,但我当然愿意将其视为我的用户错误.
The result is the same as before. It's definitely doing part of the function, as the result is not a string anymore. But I want the native Python datetime.datetime
object, and I see no way of getting it. This looks like a bug in pandas
, but I'm certainly willing to see it as user error on my part.
为什么我不能从 pandas.DataFrame
字符串列中获取本机 datetime.datetime
对象?
Why can't I get the native
datetime.datetime
object out of apandas.DataFrame
string column?
I have looked at this thread and this one, but neither of them answer my question.
:还有更奇怪的事情:
import pandas as pd
import datetime as dt
def to_native_datetime(date_str: str) -> dt.datetime:
return dt.datetime.strptime(date_str, '%m-%d-%Y')
df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))
这里我什至没有使用 pandas
来转换字符串,我 STILL 得到一个
Here I'm not even using pandas
to convert the string, and I STILL get a
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
别说了!
非常感谢您的时间!
[进一步编辑]:显然,在这个线程,在 Nehal J Wani 的回答中,当您分配到 pd.DataFrame
时,pandas
会自动转换回其原生日期时间格式.这不是我想听到的,但显然,当我读出 pd.DataFrame
时,我将不得不即时转换.
[FURTHER EDIT]: Apparently, in this thread, in Nehal J Wani's answer, it comes out that pandas
automatically converts back to its native datetime format when you assign into a pd.DataFrame
. This is not what I wanted to hear, but apparently, I'm going to have to convert on-the-fly when I read out of the pd.DataFrame
.
推荐答案
根据您的实际目标,您有几个没有直接提及的选项.
Depending on what your actual goal is, you've a couple options you didn't mention directly.
1) 如果您有一个静态日期时间对象或一列(pandas)时间戳,并且您愿意处理 Timedelta 的 Pandas 版本(pandas._libs.tslibs.timedeltas.Timedelta
code>),可以直接在pandas中做减法:
1) If you have a static datetime object or a column of (pandas) Timestamps, and you're willing to deal with the Pandas version of a Timedelta (pandas._libs.tslibs.timedeltas.Timedelta
), you can do the subtraction directly in pandas:
df = pd.DataFrame(columns=['Date'])
df.loc[0] = [pd.to_datetime('03-21-2019')]
df.loc[:, 'Offset'] = pd.Series([datetime.now()])
df.loc[:, 'Diff1'] = df['Offset'] - df['Date']
df.loc[:, 'Diff2'] = df['Date'] - datetime.now()
2) 如果您不关心 Dataframes,但愿意处理列表/numpy 数组,则可以通过对系列而不是单个元素进行操作,将日期时间转换为 Python 原生日期时间.下面,arr
是 datetime.datetime
对象的 numpy.ndarray
.您可以使用 list(arr)
将其更改为日期时间的常规列表:
2) If you don't care about Dataframes, but are willing to deal with lists / numpy arrays, you can convert the datetimes to python-native datetimes by operating on the series rather than on individual elements. Below, arr
is a numpy.ndarray
of datetime.datetime
objects. You can change it to a regular list of datetime with list(arr)
:
arr = df['Date'].dt.to_pydatetime()
这篇关于Pandas `.to_pydatetime()` 在 DataFrame 中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!