如何将字符串转换为具有null的datetime-python,pandas? [英] How to convert string to datetime with nulls - python, pandas?

查看:180
本文介绍了如何将字符串转换为具有null的datetime-python,pandas?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个序列,其中有一些日期时间(作为字符串),而一些空值则是'nan':

I have a series with some datetimes (as strings) and some nulls as 'nan':

import pandas as pd, numpy as np, datetime as dt
df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']})

我正在尝试将它们转换为日期时间:

I'm trying to convert these to datetime:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

但是我得到了错误:

time data 'nan' does not match format '%Y-%m-%d %H:%M:%S'

因此,我尝试将其转换为实际的空值:

So I try to turn these into actual nulls:

df.ix[df['Date'] == 'nan', 'Date'] = np.NaN

并重复:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

但是我得到了错误:

必须是字符串,而不是浮点数

must be string, not float

解决此问题的最快方法是什么?

What is the quickest way to solve this problem?

推荐答案

只需使用

Just use to_datetime and set errors='coerce' to handle duff data:

In [321]:

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df
Out[321]:
                 Date
0 2014-10-20 10:44:31
1 2014-10-23 09:33:46
2                 NaT
3 2014-10-01 09:38:45

In [322]:

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 1 columns):
Date    3 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 64.0 bytes

调用strptime的问题在于,如果字符串或dtype不正确,则会引发错误.

the problem with calling strptime is that it will raise an error if the string, or dtype is incorrect.

如果您这样做,那么它将起作用:

If you did this then it would work:

In [324]:

def func(x):
    try:
        return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
    except:
        return pd.NaT

df['Date'].apply(func)
Out[324]:
0   2014-10-20 10:44:31
1   2014-10-23 09:33:46
2                   NaT
3   2014-10-01 09:38:45
Name: Date, dtype: datetime64[ns]

,但是使用内置的to_datetime而不是调用apply会更快,后者实际上只是循环播放您的系列.

but it will be faster to use the inbuilt to_datetime rather than call apply which essentially just loops over your series.

时间

In [326]:

%timeit pd.to_datetime(df['Date'], errors='coerce')
%timeit df['Date'].apply(func)
10000 loops, best of 3: 65.8 µs per loop
10000 loops, best of 3: 186 µs per loop

我们在这里看到使用to_datetime的速度快了3倍.

We see here that using to_datetime is 3X faster.

这篇关于如何将字符串转换为具有null的datetime-python,pandas?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆