Python Pandas整数YYYYMMDD到日期时间 [英] Python pandas integer YYYYMMDD to datetime
问题描述
为此表示歉意,但是经过两个小时的搜索和尝试,我在这里找不到正确的答案.我有一个数据框,通过熊猫io sql.read_frame()填充.
对我来说太多的列是dtype
int64
.整数的格式为YYYYMMDD
.例如20070530
-2007年5月30日.我尝试了多种方法,最明显的是:
Apologies in advance for this, but after two hours of searching and trying I cannot get the right answer here. I have a data frame, populated via pandas io sql.read_frame().
The column that is proving to be too much for me is of dtype
int64
. The integers is of the format YYYYMMDD
. for example 20070530
- 30th of may 2007. I have tried a range of approaches, the most obvious being;
pd.to_datetime(dt['Date'])
和pd.to_datetime(str(dt['Date']))
在功能上具有多种变体,具有不同的参数.
with multiple variations on the functions different parameters.
结果充其量是将日期解释为时间.日期设置为1970-01-01
-根据上述示例1970-01-01 00:00:00.020070530
The result has been, at best, that the date interpreted as being the time. The date is set to 1970-01-01
- outcome as per above example 1970-01-01 00:00:00.020070530
我还尝试了在模拟帖子中找到的各种.map()
函数.
I also tried various .map()
functions found in simular posts.
我注意到根据np.date_range()
可以解释格式为YYYYMMDD
的字符串值,但这是我所看到的最接近的解决方案.
I have noticed that according to np.date_range()
can interpret string values of the format YYYYMMDD
, but that is the closest I have come to seeing a solution.
如果有人回答,我将非常感激!
If anyone has an answer, I would be very greatful!
鉴于Ed Chum的回答,问题很可能与编码有关. rep()
在dataFrame的子集上产生:
In view of the answer from Ed Chum, the problem is most likely related to encoding. rep()
on a subset of the dataFrame yields:
OrdNo LstInvDt \ n0
9 20070620 \ n1
11 20070830 \ n2
19 20070719 \ n3
21 20070719 \ n4
23 20070719 \ n5
26 20070911 \ n7
29 20070918 \ n8
31 0070816 \ n9
34 20070925 \ n10
OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070911\n7
29 20070918\n8
31 0070816\n9
34 20070925\n10
这是LstInvDt
是dtype int64时的情况.
This is when LstInvDt
is dtype int64.
推荐答案
to_datetime
接受格式字符串:
to_datetime
accepts a format string:
In [92]:
t = 20070530
pd.to_datetime(str(t), format='%Y%m%d')
Out[92]:
Timestamp('2007-05-30 00:00:00')
示例:
In [94]:
t = 20070530
df = pd.DataFrame({'date':[t]*10})
df
Out[94]:
date
0 20070530
1 20070530
2 20070530
3 20070530
4 20070530
5 20070530
6 20070530
7 20070530
8 20070530
9 20070530
In [98]:
df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df
Out[98]:
date DateTime
0 20070530 2007-05-30
1 20070530 2007-05-30
2 20070530 2007-05-30
3 20070530 2007-05-30
4 20070530 2007-05-30
5 20070530 2007-05-30
6 20070530 2007-05-30
7 20070530 2007-05-30
8 20070530 2007-05-30
9 20070530 2007-05-30
In [99]:
df.dtypes
Out[99]:
date int64
DateTime datetime64[ns]
dtype: object
编辑
实际上,将类型转换为字符串,然后将整个系列转换为日期时间,而不是对每个值调用Apply都更快:
Actually it's quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:
In [102]:
df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
df
Out[102]:
date DateTime
0 20070530 2007-05-30
1 20070530 2007-05-30
2 20070530 2007-05-30
3 20070530 2007-05-30
4 20070530 2007-05-30
5 20070530 2007-05-30
6 20070530 2007-05-30
7 20070530 2007-05-30
8 20070530 2007-05-30
9 20070530 2007-05-30
时间
In [104]:
%timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
100 loops, best of 3: 2.55 ms per loop
In [105]:
%timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
1000 loops, best of 3: 396 µs per loop
这篇关于Python Pandas整数YYYYMMDD到日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!