pandas read_csv中的datetime dtypes [英] datetime dtypes in pandas read_csv
问题描述
我正在读一个带有多个datetime列的csv文件。我需要在读入文件时设置数据类型,但数据时间似乎是一个问题。例如:
I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:
headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)
运行时出现错误:
TypeError: data type "datetime" not understood
事实,通过pandas.to_datetime()不是一个选项我不知道哪些列将是datetime对象。
Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be datetime objects. That information can change and comes from whatever informs my dtypes list.
或者,我试图用numpy.genfromtxt加载csv文件,在该函数中设置dtypes ,然后转换为pandas.dataframe,但它会加载数据。非常感谢任何帮助。
Alternatively, I've tried to load the csv file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!
推荐答案
为什么它不工作
没有为read_csv设置的datetime dtype,因为csv文件只能包含字符串,整数和浮点数。
Why it does not work
There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.
将dtype设置为datetime会使pandas将datetime解释为
Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.
pandas.read_csv()
函数有一个关键字参数 parse_dates
使用这个,你可以即时转换字符串,使用默认 date_parser
( dateutil.parser.parser
)将浮点值或整数转换为数据时间
Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser
(dateutil.parser.parser
)
headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)
这将导致pandas读取 col1
和 col2
作为字符串,他们很可能是-05等),并且在读取字符串之后,每个列的date_parser将对该字符串进行操作并返回该函数返回的任何值。
This will cause pandas to read col1
and col2
as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.
pandas.read_csv()
函数也有一个关键字参数 date_parser
The pandas.read_csv()
function also has a keyword argument called date_parser
将此设置为lambda函数将使该特定函数用于解析日期。
Setting this to a lambda function will make that particular function be used for the parsing of the dates.
您必须提供函数,而不是函数的执行,因此这是正确
You have to give it the function, not the execution of the function, thus this is Correct
date_parser = pd.datetools.to_datetime
这是不正确:
date_parser = pd.datetools.to_datetime()
这篇关于pandas read_csv中的datetime dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!