pandas.to_datetime时间字符串格式不一致 [英] pandas.to_datetime inconsistent time string format

查看:408
本文介绍了pandas.to_datetime时间字符串格式不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用pandas.to_datetime()pandas.DataFrame的索引从字符串格式转换为日期时间索引.

I am attempting to convert the index of a pandas.DataFrame from string format to a datetime index, using pandas.to_datetime().

导入熊猫:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.10.1'

创建示例数据框:

In [3]: d = {'data' : pd.Series([1.,2.], index=['26/12/2012', '10/01/2013'])}

In [4]: df=pd.DataFrame(d)

查看索引.请注意,日期格式为日/月/年:

Look at indices. Note that the date format is day/month/year:

In [5]: df.index
Out[5]: Index([26/12/2012, 10/01/2013], dtype=object)

将索引转换为日期时间:

Convert index to datetime:

In [6]: pd.to_datetime(df.index)
Out[6]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-10-01 00:00:00]
Length: 2, Freq: None, Timezone: None

在此阶段,您已经可以看到每个条目的日期格式已采用不同的格式.第一个很好,第二个已经交换了月份和日期.

Already at this stage, you can see that the date format for each entry has been formatted differently. The first is fine, the second has swapped month and day.

这是我要写的,但是要避免日期字符串的格式不一致:

This is what I want to write, but avoiding the inconsistent formatting of date strings:

In [7]: df.set_index(pd.to_datetime(df.index))
Out[7]: 
data
2012-12-26   1
2013-10-01   2

我认为第一个输入是正确的,因为该功能知道"没有26个月,因此不会选择默认的月/日/年格式.

I guess the first entry is correct because the function 'knows' there aren't 26 months, and so does not choose the default month/day/year format.

还有另一种/更好的方法吗?我可以将格式传递给to_datetime()函数吗?

Is there another/better way to do this? Can I pass the format into the to_datetime() function?

谢谢.

我找到了一种无需pandas.to_datetime的方法:

I have found a way to do this, without pandas.to_datetime:

import datetime.datetime as dt
date_string_list = df.index.tolist()
datetime_list = [ dt.strptime(date_string_list[x], '%d/%m/%Y') for x in range(len(date_string_list)) ]
df.index=datetime_list

但是有点混乱.欢迎任何改进.

but it's a bit messy. Any improvements welcome.

推荐答案

There are (hidden?) dayfirst argument to to_datetime:

In [23]: pd.to_datetime(df.index, dayfirst=True)
Out[23]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None

在熊猫0.11(及更高版本)中,您将可以使用format参数:

In pandas 0.11 (onwards) you'll be able to use the format argument:

In [24]: pd.to_datetime(df.index, format='%d/%m/%Y')
Out[24]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None

这篇关于pandas.to_datetime时间字符串格式不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆