如何获取pandas.read_csv()从CSV文件列推断datetime和timedelta类型? [英] How to get pandas.read_csv() to infer datetime and timedelta types from CSV file columns?

查看:3019
本文介绍了如何获取pandas.read_csv()从CSV文件列推断datetime和timedelta类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pandas.read_csv()推断列的类型,但我不能得到它推断任何datetime或timedelta类型(例如 datetime64 timedelta64

以下是一个CSV文件示例:

  datetime,timedelta,integer,number,boolean,string 
20111230 00:00:00,一小时,10,1.6,True,Foobar

和一些用pandas读取的代码:

  dataframe = pandas.read_csv(path)

out作为对象,对象,int,float,bool,对象。除了前两个列,我想要的是datetime和timedelta。



可以获取pandas自动检测datetime和timedelta列?



(我不想告诉pandas哪些列是datetimes和timedeltas或告诉它的格式,我想让它自动尝试和检测它们它可以用于into,float和bool列。)

解决方案

你可以做的一件事是定义日期解析器使用 strptime ,这将处理您的日期格式,但这不是自动的:

 在[59]:

导入pandas作为pd
导入datetime为dt

def parse_dates(x):
return dt.datetime.strptime x,'%Y%m%d%H:%M:%S')
#字典查找,转换
word_to_int = {'zero':0,
' :1,
'two':2,
'three':3,
'four':4,
'five':5,
' :6,
'seven':7,
'eight':8,
'nine':9}


def str_to_time_delta :
num = 0
如果x.lower()中的'hour':
num = x [0:x.find('')]。 dt.timedelta(hours = word_to_int [num])
df = pd.read_csv(r'c:\temp1.txt',parse_dates = [0],date_parser = parse_dates)
df.dtypes b $ b Out [59]:
datetime datetime64 [ns]
timedelta object
integer int64
number float64
boolean bool
string object
dtype:object
在[60]:

然后转换到timedeltas使用dict并且函数解析和转换为timedeltas

  df ['timedelta'] = df ['timedelta'] map(str_to_time_delta) 

In [61]:

df.dtypes
Out [61]:
datetime datetime64 [ns]
timedelta timedelta64 [ns ]
integer int64
number float64
boolean bool
string object
dtype:object
在[62]中:

df
Out [62]:
datetime timedelta整数boolean string
0 2011-12-30 00:00:00 01:00:00 10 1.6 True Foobar

[1 rows x 6 columns]

要回答你的主要问题,我不知道自动执行此操作。



EDIT



可以这样做:

  df ['timedelta'] = pd.to_timedelta(df ['timedelta'])



进一步编辑



正如@Jeff所指出的,你可以这样做,而不是使用 strptime 时读取csv(版本0.13.1及以上):

  df = pd.read_csv(r'c:\temp1.txt',parse_dates = [0],infer_datetime_format = True)


pandas.read_csv() infers the types of columns, but I can't get it to infer any datetime or timedelta type (e.g. datetime64, timedelta64) for columns whose values seem like obvious datetimes and time deltas.

Here's an example CSV file:

datetime,timedelta,integer,number,boolean,string
20111230 00:00:00,one hour,10,1.6,True,Foobar

And some code to read it with pandas:

dataframe = pandas.read_csv(path)

The types of the columns on that dataframe come out as object, object, int, float, bool, object. They're all as I would expect except the first two columns, which I want to be datetime and timedelta.

Is it possible to get pandas to automatically detect datetime and timedelta columns?

(I don't want to have to tell pandas which columns are datetimes and timedeltas or tell it the formats, I want it to try and detect them automatically like it does for into, float and bool columns.)

解决方案

One thing you can do is define your date parser using strptime, this will handle your date format, this isn't automatic though:

In [59]:

import pandas as pd
import datetime as dt

def parse_dates(x):
    return dt.datetime.strptime(x, '%Y%m%d %H:%M:%S')
# dict for word lookup, conversion
word_to_int={'zero':0,
     'one':1,
     'two':2,
     'three':3,
     'four':4,
     'five':5,
     'six':6,
     'seven':7,
     'eight':8,
     'nine':9}


def str_to_time_delta(x):
    num = 0
    if 'hour' in x.lower():
        num = x[0:x.find(' ')].lower()
    return dt.timedelta( hours = word_to_int[num])
df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0],date_parser=parse_dates)
df.dtypes
Out[59]:
datetime     datetime64[ns]
timedelta            object
integer               int64
number              float64
boolean                bool
string               object
dtype: object
In [60]:

Then to convert to timedeltas use the dict and function to parse and convert to timedeltas

df['timedelta'] = df['timedelta'].map(str_to_time_delta)

In [61]:

df.dtypes
Out[61]:
datetime      datetime64[ns]
timedelta    timedelta64[ns]
integer                int64
number               float64
boolean                 bool
string                object
dtype: object
In [62]:

df
Out[62]:
             datetime  timedelta  integer  number boolean  string
0 2011-12-30 00:00:00   01:00:00       10     1.6    True  Foobar

[1 rows x 6 columns]

To answer your principal question I don't know of a way to automatically do this.

EDIT

Instead of my convoluted mapping function you can do just this:

df['timedelta'] = pd.to_timedelta(df['timedelta'])

Further edit

As noted by @Jeff you can do this instead of using strptime when reading the csv (in version 0.13.1 and above though):

df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0], infer_datetime_format=True)

这篇关于如何获取pandas.read_csv()从CSV文件列推断datetime和timedelta类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆