在具有历史日期的 pandas 中阅读CSV文件 [英] Reading CSV file in Pandas with historical dates

查看:127
本文介绍了在具有历史日期的 pandas 中阅读CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以(UK)格式13/01/1800中的日期读取文件,但是某些日期在1667之前,不能以纳秒时间戳表示(请参阅 http://pandas.pydata.org/pandas-docs/stable/gotchas。 html#gotchas-timestamp-limits )。我从该页面了解到,我需要创建自己的PeriodIndex来覆盖我需要的范围(请参阅 http://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-oob ),但我无法理解我如何将csv reader中的字符串转换为日期在此期间指数。

I'm trying to read a file in with dates in the (UK) format 13/01/1800, however some of the dates are before 1667, which cannot be represented by the nanosecond timestamp (see http://pandas.pydata.org/pandas-docs/stable/gotchas.html#gotchas-timestamp-limits). I understand from that page I need to create my own PeriodIndex to cover the range I need (see http://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-oob) but I can't understand how I convert the string in the csv reader to a date in this periodindex.

到目前为止,我有:

span = pd.period_range('1000-01-01', '2100-01-01', freq='D')
df_earliest= pd.read_csv("objects.csv", index_col=0, names=['Object Id', 'Earliest Date'], parse_dates=[1], infer_datetime_format=True, dayfirst=True)

如何将span应用于日期阅读器/转换器,以便我可以在数据框中创建一个PeriodIndex / DateTimeIndex列?

How do I apply the span to the date reader/converter so I can create a PeriodIndex / DateTimeIndex column in the dataframe ?

推荐答案

你可以尝试这样做:

fn = r'D:\temp\.data\36987699.csv'

def dt_parse(s):
    d,m,y = s.split('/')
    return pd.Period(year=int(y), month=int(m), day=int(d), freq='D')


df = pd.read_csv(fn, parse_dates=[0], date_parser=dt_parse)

输入文件:

Date,col1
13/01/1800,aaa
25/12/1001,bbb
01/03/1267,ccc

测试:

In [16]: df
Out[16]:
        Date col1
0 1800-01-13  aaa
1 1001-12-25  bbb
2 1267-03-01  ccc

In [17]: df.dtypes
Out[17]:
Date    object
col1    object
dtype: object

In [18]: df['Date'].dt.year
Out[18]:
0    1800
1    1001
2    1267
Name: Date, dtype: int64

PS你可能想添加 try ... catch dt_parse()函数捕获 ValueError:异常 - int的结果() ...

PS you may want to add try ... catch block in the dt_parse() function for catching ValueError: exceptions - result of int()...

这篇关于在具有历史日期的 pandas 中阅读CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆