用Python解析文本文件 [英] Text File Parsing with Python

查看：178 发布时间：2017/11/4 21:44:06 python parsing text file-io python-2.7

本文介绍了用Python解析文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图解析一系列的文本文件，并使用Python（2.7.3）将它们保存为CSV文件。所有的文本文件都有一个4行长的标题，需要删除。数据行有不同的分隔符，包括（引号）， - （破折号），：列和空格。我发现用C ++编写代码是很麻烦的，所以我决定用Python来试试与C / C ++相比，做起来相对容易一些。

我写了一段代码来测试一行数据，但是它工作正常，为了解析一个单行，我使用了文本对象和replace方法，它看起来像我当前的实现将文本文件作为一个列表读取，并且列表中没有替换方法对象。

作为一个Python新手，我被困在这一点上，任何输入都将被感激！

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ i，j in dic.iteritems（）：
text = text.replace（i，j）
返回文本

＃打开输入/输出文件

inputfile = open（'test.dat ）
outputfile = open（'test.csv'，'w'）

my_text = inputfile.readlines（）[4：]＃读取整个文本文件，跳过前4行

＃示例文本字符串，只是为了演示，让你知道数据如何看起来像
＃my_text ='2012-06-23 03：09：13.23，4323584 ，-1.911224，-0.4657288，-0.1166382，-0.24823,0.256485，NAN， - 0.3489428，-0.130449，-0.2440527，-0.2942413,0.04944348,0.4337797，-1.105218，-1.201882，-0.5962594，-0.586636'

＃字典定义0-，1-等有解析用破折号分隔的日期块，并确保负数不受影响
reps = {'NAN'：'NAN ''，'0'，'0'，'1'，'1'，'2'，'2'，'3'，'3'，' 4 - '：' 4， '' 5 - '：' 5 '' 6 - '：' 6 '' 7 - '：' 7， '' 8 - '：' 8， '' 9 - '：'9'，''：'，'，'：'：'，'，'
$ b txt = data_parser（my_text，reps）
outputfile.writelines（txt）

inputfile.close（）
outputfile.close（）

解决方案

code> for 循环遍历文本文件中的行：

 在my_text中：
 outputfile.writelines（data_parser（line，reps））

如果你想要逐行读取文件，而不是在脚本开始时加载整个文件，你可以这样做：

  inputfile = open（'test.dat'）
 outputfile = open（'test.csv'，'w'）
 
＃示例文本字符串，仅供示范使用数据如何看起来像
＃my_text ='2012-06-23 03：09：13.23，4323584，-1.911224，-0.4657288，-0.1166382，-0.24823,0.256485，NAN， -  0.3489428， - 0.130449，-0.2440527，-0.2942413,0.04944348,0.4337797，-1.105218，-1.201882，-0.5962594，-0.586636'
 
＃字典定义0-，1-等是否有解析日期块分隔用破折号，并确保负数bers不受影响
 reps = {'NAN'：'NAN'，'''：''，'0  - '：'0，'，'1'：'1，'，'2 -  '：' 2， '' 3  -  '：' 3， '' 4  -  '：' 4， '' 5  -  '：' 5 '' 6  -  '：' 6 '' 7 - '：'7，'，'8  - '：'8'，'9  - '：'9'，''：'，'，'：'：'，'} 
 $ b $ （4）：inputfile.next（）＃跳过前四行
输入文件中的行：
 outputfile.writelines（data_parser（line，reps））
 
 inputfile.close（）
 outputfile.close（）

I am trying to parse a series of text files and save them as CSV files using Python (2.7.3). All text files have a 4 line long header which needs to be stripped out. The data lines have various delimiters including " (quote), - (dash), : column, and blank space. I found it a pain to code it in C++ with all these different delimiters, so I decided to try it in Python hearing it is relatively easier to do compared to C/C++.

I wrote a piece of code to test it for a single line of data and it works, however, I could not manage to make it work for the actual file. For parsing a single line I was using the text object and "replace" method. It looks like my current implementation reads the text file as a list, and there is no replace method for the list object.

Being a novice in Python, I got stuck at this point. Any input would be appreciated!

Thanks!
# function for parsing the data def data_parser(text, dic): for i, j in dic.iteritems(): text = text.replace(i,j) return text # open input/output files inputfile = open('test.dat') outputfile = open('test.csv', 'w') my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines # sample text string, just for demonstration to let you know how the data looks like # my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636' # dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' } txt = data_parser(my_text, reps) outputfile.writelines(txt) inputfile.close() outputfile.close()

解决方案
I would use a for loop to iterate over the lines in the text file:
for line in my_text: outputfile.writelines(data_parser(line, reps))
If you want to read the file line-by-line instead of loading the whole thing at the start of the script you could do something like this:
inputfile = open('test.dat') outputfile = open('test.csv', 'w') # sample text string, just for demonstration to let you know how the data looks like # my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636' # dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' } for i in range(4): inputfile.next() # skip first four lines for line in inputfile: outputfile.writelines(data_parser(line, reps)) inputfile.close() outputfile.close()

这篇关于用Python解析文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Python解析文本文件 [英] Text File Parsing with Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

用Python解析文本文件 [英] Text File Parsing with Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭