用Python解析文本文件 [英] Text File Parsing with Python

查看:178
本文介绍了用Python解析文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析一系列的文本文件,并使用Python(2.7.3)将它们保存为CSV文件。所有的文本文件都有一个4行长的标题,需要删除。数据行有不同的分隔符,包括(引号), - (破折号),:列和空格。我发现用C ++编写代码是很麻烦的,所以我决定用Python来试试与C / C ++相比,做起来相对容易一些。



我写了一段代码来测试一行数据,但是它工作正常,为了解析一个单行,我使用了文本对象和replace方法,它看起来像我当前的实现将文本文件作为一个列表读取,并且列表中没有替换方法对象。

作为一个Python新手,我被困在这一点上,任何输入都将被感激!





$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ i,j in dic.iteritems():
text = text.replace(i,j)
返回文本

#打开输入/输出文件

inputfile = open('test.dat )
outputfile = open('test.csv','w')

my_text = inputfile.readlines()[4:]#读取整个文本文件,跳过前4行


#示例文本字符串,只是为了演示,让你知道数据如何看起来像
#my_text ='2012-06-23 03:09:13.23,4323584 ,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,NAN, - 0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

#字典定义0-,1-等有解析用破折号分隔的日期块,并确保负数不受影响
reps = {'NAN':'NAN '','0','0','1','1','2','2','3','3',' 4 - ':' 4, '' 5 - ':' 5 '' 6 - ':' 6 '' 7 - ':' 7, '' 8 - ':' 8, '' 9 - ':'9','':',',':':',','
$ b txt = data_parser(my_text,reps)
outputfile.writelines(txt)

inputfile.close()
outputfile.close()


解决方案

code> for 循环遍历文本文件中的行:

 在my_text中:
outputfile.writelines(data_parser(line,reps))

如果你想要逐行读取文件,而不是在脚本开始时加载整个文件,你可以这样做:

  inputfile = open('test.dat')
outputfile = open('test.csv','w')

#示例文本字符串,仅供示范使用数据如何看起来像
#my_text ='2012-06-23 03:09:13.23,4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,NAN, - 0.3489428, - 0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

#字典定义0-,1-等是否有解析日期块分隔用破折号,并确保负数bers不受影响
reps = {'NAN':'NAN',''':'','0 - ':'0,','1':'1,','2 - ':' 2, '' 3 - ':' 3, '' 4 - ':' 4, '' 5 - ':' 5 '' 6 - ':' 6 '' 7 - ':'7,','8 - ':'8','9 - ':'9','':',',':':','}
$ b $ (4):inputfile.next()#跳过前四行
输入文件中的行:
outputfile.writelines(data_parser(line,reps))

inputfile.close()
outputfile.close()


I am trying to parse a series of text files and save them as CSV files using Python (2.7.3). All text files have a 4 line long header which needs to be stripped out. The data lines have various delimiters including " (quote), - (dash), : column, and blank space. I found it a pain to code it in C++ with all these different delimiters, so I decided to try it in Python hearing it is relatively easier to do compared to C/C++.

I wrote a piece of code to test it for a single line of data and it works, however, I could not manage to make it work for the actual file. For parsing a single line I was using the text object and "replace" method. It looks like my current implementation reads the text file as a list, and there is no replace method for the list object.

Being a novice in Python, I got stuck at this point. Any input would be appreciated!

Thanks!

# function for parsing the data
def data_parser(text, dic):
for i, j in dic.iteritems():
    text = text.replace(i,j)
return text

# open input/output files

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines


# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

txt = data_parser(my_text, reps)
outputfile.writelines(txt)

inputfile.close()
outputfile.close()

解决方案

I would use a for loop to iterate over the lines in the text file:

for line in my_text:
    outputfile.writelines(data_parser(line, reps))

If you want to read the file line-by-line instead of loading the whole thing at the start of the script you could do something like this:

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

for i in range(4): inputfile.next() # skip first four lines
for line in inputfile:
    outputfile.writelines(data_parser(line, reps))

inputfile.close()
outputfile.close()

这篇关于用Python解析文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆