使用Python的 pandas 从TXT文件解析DD MM YY HH MM SS列 [英] Parsing DD MM YY HH MM SS columns from TXT file using Python's pandas
问题描述
提前谢谢大家的时间。我有一些格式为空格分隔的文本文件;
29 04 13 18 15 00 7.667
/ pre>
29 04 13 18 30 00 7.000
29 04 13 18 45 00 7.000
29 04 13 19 00 00 7.333
29 04 13 19 15 00 7.000
的格式为DD MM YY HH MM SS和我的结果值。我正在尝试使用Python的大熊猫阅读txt文件。在发布这个问题之前,我已经尝试了很多关于这个问题的研究,所以希望我不能掩盖这个问题。
根据试错,研究我已经提出:
import pandas as pd
$ / pre
from cStringIO import StringIO
def parse_all_fields(day_col,month_col,year_col,hour_col,minute_col,second_col):
day_col = _maybe_cast(day_col)
month_col = _maybe_cast(month_col)
year_col = _maybe_cast(year_col)
hour_col = _maybe_cast(hour_col)
minute_col = _maybe_cast(minute_col)
second_col = _maybe_cast(second_col)
return lib .try_parse_datetime_components(day_col,month_col,year_col,hour_col,minute_col,second_col)
##读取.txt文件
data1 = pd.read_table('0132_3.TXT',sep ='\s +', name = ['Day','Month','Year','Hour','Min','Sec','Value'])
data1 [:10]
[21]:
日,月,年,小时,最小,秒,价值
29 04 13 18 15 00 7.667
29 04 13 18 30 00 7.000
2 9 04 13 18 45 00 7.000
29 04 13 19 00 00 7.333
29 04 13 19 15 00 7.000
data2 = pd.read_table(StringIO(data1),parse_dates = {'datetime':['Day','Month','Year','Hour''Min','Sec']},date_parser = parse_all_fields,dayfirst = True)
TypeError追溯(最近的最后一次呼叫)
< ipython-input-22-8ee408dc19c3>在< module>()
----> 1 data2 = pd.read_table(StringIO(data1),parse_dates = {'datetime':['Day','Month','Year','Hour''Min','Sec']},date_parser = parse_all_fields,dayfirst = True)
TypeError:预期的读取缓冲区,发现DataFrame
在此点我被卡住了首先,预期的读缓冲区错误使我感到困惑。我需要对.txt文件进行更多的预处理才能将日期变成可读格式吗?注意:read_table的parse_function在此日期格式上不能自行工作。
我是初学者 - 试图学习。对不起,如果代码错误/基本/混乱。如果有人能帮忙,会非常感谢非常感谢您提前。
解决方案我认为在阅读csv时解析日期会更简单: p>
在[1]中:df = pd.read_csv('0132_3.TXT',header = None,sep ='\s + \\ s',parse_dates = [[0]])
在[2]中:df
输出[2]:
0 1
0 2013-04- 29 00:00:00 7.667
1 2013-04-29 00:00:00 7.000
2 2013-04-29 00:00:00 7.000
3 2013-04-29 00 :00:00 7.333
4 2013-04-29 00:00:00 7.000
由于您使用不寻常的日期格式,您还需要指定日期解析器:
在[11]中:def date_parser ss):
日,月,年,小时,分钟,秒= ss.split()
返回pd.Timestamp('20%s-%s-%s%s:%s:% s'%(year,month,day,hour,min,sec)
在[12]中:df = pd.read_csv('0132_3.TXT',header = None,sep ='\\ \\ s + \s',parse_dates = [[0]],date_parser = date_p arser)
在[13]中:df
输出[13]:
0 1
0 2013-04-29 18:15:00 7.667
1 2013-04-29 18:30:00 7.000
2 2013-04-29 18:45:00 7.000
3 2013-04-29 19:00:00 7.333
4 2013-04-29 19:15:00 7.000
Thank you all for your time in advance. I have a number of space delimited text files in the format;
29 04 13 18 15 00 7.667 29 04 13 18 30 00 7.000 29 04 13 18 45 00 7.000 29 04 13 19 00 00 7.333 29 04 13 19 15 00 7.000
being in the format DD MM YY HH MM SS and my result value. I am trying to read the txt file using Python's pandas. I have tried doing quite a bit of research on this prior to posting this question so hope I am not covering trodden ground.
Based on trial and error and research I have come up with:
import pandas as pd from cStringIO import StringIO def parse_all_fields(day_col, month_col, year_col, hour_col, minute_col,second_col): day_col = _maybe_cast(day_col) month_col = _maybe_cast(month_col) year_col = _maybe_cast(year_col) hour_col = _maybe_cast(hour_col) minute_col = _maybe_cast(minute_col) second_col = _maybe_cast(second_col) return lib.try_parse_datetime_components(day_col, month_col, year_col, hour_col, minute_col, second_col) ##Read the .txt file data1 = pd.read_table('0132_3.TXT', sep='\s+', names=['Day','Month','Year','Hour','Min','Sec','Value']) data1[:10] Out[21]: Day,Month,Year,Hour, Min, Sec, Value 29 04 13 18 15 00 7.667 29 04 13 18 30 00 7.000 29 04 13 18 45 00 7.000 29 04 13 19 00 00 7.333 29 04 13 19 15 00 7.000 data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)
TypeError Traceback (most recent call last) <ipython-input-22-8ee408dc19c3> in <module>() ----> 1 data2 = pd.read_table(StringIO(data1), parse_dates={'datetime': ['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True) TypeError: expected read buffer, DataFrame found
At this point I am stuck. Firstly the expected read buffer error confuses me. Do I need to do more pre-processing of the .txt file to get the dates into a readable format? Note - the parse_function of read_table does not work on its own on this date format.
I am a beginner - trying to learn. Sorry if the code is wrong/basic/confusing. Would be very appreciative if someone could help. Many thanks in advance.
解决方案I think it's going to be easier just to parse the dates them when reading the csv:
In [1]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]]) In [2]: df Out[2]: 0 1 0 2013-04-29 00:00:00 7.667 1 2013-04-29 00:00:00 7.000 2 2013-04-29 00:00:00 7.000 3 2013-04-29 00:00:00 7.333 4 2013-04-29 00:00:00 7.000
Since you're using a unusual date format you need to specify a date parser too:
In [11]: def date_parser(ss): day, month, year, hour, min, sec = ss.split() return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec)) In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser) In [13]: df Out[13]: 0 1 0 2013-04-29 18:15:00 7.667 1 2013-04-29 18:30:00 7.000 2 2013-04-29 18:45:00 7.000 3 2013-04-29 19:00:00 7.333 4 2013-04-29 19:15:00 7.000
这篇关于使用Python的 pandas 从TXT文件解析DD MM YY HH MM SS列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!