Python Pandas整数YYYYMMDD到日期时间 [英] Python pandas integer YYYYMMDD to datetime

查看:288
本文介绍了Python Pandas整数YYYYMMDD到日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为此表示歉意,但是经过两个小时的搜索和尝试,我在这里找不到正确的答案.我有一个数据框,通过熊猫io sql.read_frame()填充. 对我来说太多的列是dtype int64.整数的格式为YYYYMMDD.例如20070530-2007年5月30日.我尝试了多种方法,最明显的是:

Apologies in advance for this, but after two hours of searching and trying I cannot get the right answer here. I have a data frame, populated via pandas io sql.read_frame(). The column that is proving to be too much for me is of dtype int64. The integers is of the format YYYYMMDD. for example 20070530 - 30th of may 2007. I have tried a range of approaches, the most obvious being;

pd.to_datetime(dt['Date'])pd.to_datetime(str(dt['Date']))

在功能上具有多种变体,具有不同的参数.

with multiple variations on the functions different parameters.

结果充其量是将日期解释为时间.日期设置为1970-01-01-根据上述示例1970-01-01 00:00:00.020070530

The result has been, at best, that the date interpreted as being the time. The date is set to 1970-01-01 - outcome as per above example 1970-01-01 00:00:00.020070530

我还尝试了在模拟帖子中找到的各种.map()函数.

I also tried various .map() functions found in simular posts.

我注意到根据np.date_range()可以解释格式为YYYYMMDD的字符串值,但这是我所看到的最接近的解决方案.

I have noticed that according to np.date_range() can interpret string values of the format YYYYMMDD, but that is the closest I have come to seeing a solution.

如果有人回答,我将非常感激!

If anyone has an answer, I would be very greatful!

鉴于Ed Chum的回答,问题很可能与编码有关. rep()在dataFrame的子集上产生:

In view of the answer from Ed Chum, the problem is most likely related to encoding. rep() on a subset of the dataFrame yields:

OrdNo LstInvDt \ n0
9 20070620 \ n1
11 20070830 \ n2
19 20070719 \ n3
21 20070719 \ n4
23 20070719 \ n5
26 20070911 \ n7
29 20070918 \ n8
31 0070816 \ n9
34 20070925 \ n10

OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070911\n7
29 20070918\n8
31 0070816\n9
34 20070925\n10

这是LstInvDt是dtype int64时的情况.

This is when LstInvDt is dtype int64.

推荐答案

to_datetime 接受格式字符串:

to_datetime accepts a format string:

In [92]:

t = 20070530
pd.to_datetime(str(t), format='%Y%m%d')
Out[92]:
Timestamp('2007-05-30 00:00:00')

示例:

In [94]:

t = 20070530
df = pd.DataFrame({'date':[t]*10})
df
Out[94]:
       date
0  20070530
1  20070530
2  20070530
3  20070530
4  20070530
5  20070530
6  20070530
7  20070530
8  20070530
9  20070530
In [98]:

df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df
Out[98]:
       date   DateTime
0  20070530 2007-05-30
1  20070530 2007-05-30
2  20070530 2007-05-30
3  20070530 2007-05-30
4  20070530 2007-05-30
5  20070530 2007-05-30
6  20070530 2007-05-30
7  20070530 2007-05-30
8  20070530 2007-05-30
9  20070530 2007-05-30
In [99]:

df.dtypes
Out[99]:
date                 int64
DateTime    datetime64[ns]
dtype: object

编辑

实际上,将类型转换为字符串,然后将整个系列转换为日期时间,而不是对每个值调用Apply都更快:

Actually it's quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:

In [102]:

df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
df
Out[102]:
       date   DateTime
0  20070530 2007-05-30
1  20070530 2007-05-30
2  20070530 2007-05-30
3  20070530 2007-05-30
4  20070530 2007-05-30
5  20070530 2007-05-30
6  20070530 2007-05-30
7  20070530 2007-05-30
8  20070530 2007-05-30
9  20070530 2007-05-30

时间

In [104]:

%timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))

100 loops, best of 3: 2.55 ms per loop
In [105]:

%timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
1000 loops, best of 3: 396 µs per loop

这篇关于Python Pandas整数YYYYMMDD到日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆