使用Python的 pandas 从TXT文件解析DD MM YY HH MM SS列 [英] Parsing DD MM YY HH MM SS columns from TXT file using Python's pandas

查看:419
本文介绍了使用Python的 pandas 从TXT文件解析DD MM YY HH MM SS列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

提前谢谢大家的时间。我有一些格式为空格分隔的文本文件;

  29 04 13 18 15 00 7.667 
29 04 13 18 30 00 7.000
29 04 13 18 45 00 7.000
29 04 13 19 00 00 7.333
29 04 13 19 15 00 7.000
/ pre>

的格式为DD MM YY HH MM SS和我的结果值。我正在尝试使用Python的大熊猫阅读txt文件。在发布这个问题之前,我已经尝试了很多关于这个问题的研究,所以希望我不能掩盖这个问题。



根据试错,研究我已经提出:

  import pandas as pd 
from cStringIO import StringIO
def parse_all_fields(day_col,month_col,year_col,hour_col,minute_col,second_col):
day_col = _maybe_cast(day_col)
month_col = _maybe_cast(month_col)
year_col = _maybe_cast(year_col)
hour_col = _maybe_cast(hour_col)
minute_col = _maybe_cast(minute_col)
second_col = _maybe_cast(second_col)
return lib .try_parse_datetime_components(day_col,month_col,year_col,hour_col,minute_col,second_col)
##读取.txt文件
data1 = pd.read_table('0132_3.TXT',sep ='\s +', name = ['Day','Month','Year','Hour','Min','Sec','Value'])
data1 [:10]

[21]:

日,月,年,小时,最小,秒,价值
29 04 13 18 15 00 7.667
29 04 13 18 30 00 7.000
2 9 04 13 18 45 00 7.000
29 04 13 19 00 00 7.333
29 04 13 19 15 00 7.000

data2 = pd.read_table(StringIO(data1),parse_dates = {'datetime':['Day','Month','Year','Hour''Min','Sec']},date_parser = parse_all_fields,dayfirst = True)
$ / pre




  TypeError追溯(最近的最后一次呼叫)
< ipython-input-22-8ee408dc19c3>在< module>()
----> 1 data2 = pd.read_table(StringIO(data1),parse_dates = {'datetime':['Day','Month','Year','Hour''Min','Sec']},date_parser = parse_all_fields,dayfirst = True)

TypeError:预期的读取缓冲区,发现DataFrame

在此点我被卡住了首先,预期的读缓冲区错误使我感到困惑。我需要对.txt文件进行更多的预处理才能将日期变成可读格式吗?注意:read_table的parse_function在此日期格式上不能自行工作。



我是初学者 - 试图学习。对不起,如果代码错误/基本/混乱。如果有人能帮忙,会非常感谢非常感谢您提前。

解决方案

我认为在阅读csv时解析日期会更简单: p>

 在[1]中:df = pd.read_csv('0132_3.TXT',header = None,sep ='\s + \\ s',parse_dates = [[0]])

在[2]中:df
输出[2]:
0 1
0 2013-04- 29 00:00:00 7.667
1 2013-04-29 00:00:00 7.000
2 2013-04-29 00:00:00 7.000
3 2013-04-29 00 :00:00 7.333
4 2013-04-29 00:00:00 7.000

由于您使用不寻常的日期格式,您还需要指定日期解析器:

 在[11]中:def date_parser ss):
日,月,年,小时,分钟,秒= ss.split()
返回pd.Timestamp('20%s-%s-%s%s:%s:% s'%(year,month,day,hour,min,sec)

在[12]中:df = pd.read_csv('0132_3.TXT',header = None,sep ='\\ \\ s + \s',parse_dates = [[0]],date_parser = date_p arser)

在[13]中:df
输出[13]:
0 1
0 2013-04-29 18:15:00 7.667
1 2013-04-29 18:30:00 7.000
2 2013-04-29 18:45:00 7.000
3 2013-04-29 19:00:00 7.333
4 2013-04-29 19:15:00 7.000


Thank you all for your time in advance. I have a number of space delimited text files in the format;

    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

being in the format DD MM YY HH MM SS and my result value. I am trying to read the txt file using Python's pandas. I have tried doing quite a bit of research on this prior to posting this question so hope I am not covering trodden ground.

Based on trial and error and research I have come up with:

    import pandas as pd
    from cStringIO import StringIO
    def parse_all_fields(day_col, month_col, year_col, hour_col, minute_col,second_col):
    day_col = _maybe_cast(day_col)
    month_col = _maybe_cast(month_col)
    year_col = _maybe_cast(year_col)
    hour_col = _maybe_cast(hour_col)
    minute_col = _maybe_cast(minute_col)
    second_col = _maybe_cast(second_col)
    return lib.try_parse_datetime_components(day_col, month_col, year_col, hour_col, minute_col, second_col)
    ##Read the .txt file
    data1 = pd.read_table('0132_3.TXT', sep='\s+', names=['Day','Month','Year','Hour','Min','Sec','Value'])
    data1[:10]

    Out[21]: 

    Day,Month,Year,Hour, Min, Sec, Value
    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

    data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)


    TypeError                                 Traceback (most recent call last)
    <ipython-input-22-8ee408dc19c3> in <module>()
    ----> 1 data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':   ['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

    TypeError: expected read buffer, DataFrame found

At this point I am stuck. Firstly the expected read buffer error confuses me. Do I need to do more pre-processing of the .txt file to get the dates into a readable format? Note - the parse_function of read_table does not work on its own on this date format.

I am a beginner - trying to learn. Sorry if the code is wrong/basic/confusing. Would be very appreciative if someone could help. Many thanks in advance.

解决方案

I think it's going to be easier just to parse the dates them when reading the csv:

In [1]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]])

In [2]: df
Out[2]:
                    0      1
0 2013-04-29 00:00:00  7.667
1 2013-04-29 00:00:00  7.000
2 2013-04-29 00:00:00  7.000
3 2013-04-29 00:00:00  7.333
4 2013-04-29 00:00:00  7.000

Since you're using a unusual date format you need to specify a date parser too:

In [11]: def date_parser(ss):
             day, month, year, hour, min, sec = ss.split()
             return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))

In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

In [13]: df
Out[13]:
                    0      1
0 2013-04-29 18:15:00  7.667
1 2013-04-29 18:30:00  7.000
2 2013-04-29 18:45:00  7.000
3 2013-04-29 19:00:00  7.333
4 2013-04-29 19:15:00  7.000

这篇关于使用Python的 pandas 从TXT文件解析DD MM YY HH MM SS列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆