pandas read_csv中的datetime dtypes [英] datetime dtypes in pandas read_csv

查看:3267
本文介绍了pandas read_csv中的datetime dtypes的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读一个带有多个datetime列的csv文件。我需要在读入文件时设置数据类型,但数据时间似乎是一个问题。例如:

I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

运行时出现错误:

TypeError: data type "datetime" not understood

事实,通过pandas.to_datetime()不是一个选项我不知道哪些列将是datetime对象。

Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be datetime objects. That information can change and comes from whatever informs my dtypes list.

或者,我试图用numpy.genfromtxt加载csv文件,在该函数中设置dtypes ,然后转换为pandas.dataframe,但它会加载数据。非常感谢任何帮助。

Alternatively, I've tried to load the csv file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

推荐答案

为什么它不工作



没有为read_csv设置的datetime dtype,因为csv文件只能包含字符串,整数和浮点数。

Why it does not work

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.

将dtype设置为datetime会使pandas将datetime解释为

Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

pandas.read_csv() 函数有一个关键字参数 parse_dates

使用这个,你可以即时转换字符串,使用默认 date_parser dateutil.parser.parser )将浮点值或整数转换为数据时间

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

这将导致pandas读取 col1 col2 作为字符串,他们很可能是-05等),并且在读取字符串之后,每个列的date_parser将对该字符串进行操作并返回该函数返回的任何值。

This will cause pandas to read col1 and col2 as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.

pandas.read_csv() 函数有一个关键字参数 date_parser

The pandas.read_csv() function also has a keyword argument called date_parser

将此设置为lambda函数将使该特定函数用于解析日期。

Setting this to a lambda function will make that particular function be used for the parsing of the dates.

您必须提供函数,而不是函数的执行,因此这是正确

You have to give it the function, not the execution of the function, thus this is Correct

date_parser = pd.datetools.to_datetime

这是不正确

date_parser = pd.datetools.to_datetime()

这篇关于pandas read_csv中的datetime dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆