pandas to_datetime ValueError:未知的字符串格式 [英] Pandas to_datetime ValueError: Unknown string format
问题描述
我的(熊猫)数据框中有一列:
I have a column in my (pandas) dataframe:
data['Start Date'].head()
type(data['Start Date'])
Output:
1/7/13
1/7/13
1/7/13
16/7/13
16/7/13
<class 'pandas.core.series.Series'>
当我将其转换为日期格式(如下)时,出现错误 ValueError:未知的字符串格式
When I convert it into a date format (as follows) I am getting the error ValueError: Unknown string format
data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
...
...
/Library/Python/2.7/site-packages/pandas/tseries/tools.pyc in _convert_listlike(arg, box, format, name)
381 return DatetimeIndex._simple_new(values, name=name, tz=tz)
382 except (ValueError, TypeError):
--> 383 raise e
384
385 if arg is None:
ValueError: Unknown string format
我在这里想念什么?
推荐答案
我认为问题出在数据中-存在问题字符串.因此,您可以尝试在Start Date
列中检查字符串的长度:
I think the problem is in data - a problematic string exists. So you can try check length of the string in column Start Date
:
import pandas as pd
import io
temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13"""
data = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=False)
#data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
print data
Start Date
0 1/7/13
1 1/7/1
2 1/7/13 12 17
3 16/7/13
4 16/7/13
#check, if length is more as 7
print data[data['Start Date'].str.len() > 7]
Start Date
2 1/7/13 12 17
或者您可以尝试以其他方式找到这些有问题的行,例如仅读取日期时间的一部分并检查解析日期时间:
Or you can try to find these problematic row different way e.g. read only part of the datetime and check parsing datetime:
#read first 3 rows
data= data.iloc[:3]
data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
但这只是提示.
感谢joris的建议,将参数errors ='coerce'
添加到
Thanks joris for suggestion add parameter errors ='coerce'
to to_datetime
:
temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13 12 04"""
data = pd.read_csv(io.StringIO(temp), sep=";")
#add parameter errors coerce
data['Start Date']= pd.to_datetime(data['Start Date'], dayfirst=True, errors='coerce')
print data
Start Date
0 2013-07-01
1 2001-07-01
2 NaT
3 2013-07-16
4 NaT
#index of data with null - NaT to variable idx
idx = data[data['Start Date'].isnull()].index
print idx
Int64Index([2, 4], dtype='int64')
#read csv again
data = pd.read_csv(io.StringIO(temp), sep=";")
#find problematic rows, where datetime is not parsed
print data.iloc[idx]
Start Date
2 1/7/13 12 17
4 16/7/13 12 04
这篇关于 pandas to_datetime ValueError:未知的字符串格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!