根据其内容拆分文本文件 [英] Split up a text file based on its contents
问题描述
我有一个文本文件,如下所示:
I have a text file that looks like this:
SYSTEM
DOF=UY,UZ,RX LENGTH=FT FORCE=Kip
JOINT
1 X=0 Y=-132.644 Z=0
2 X=0 Y=-80 Z=0
3 X=0 Y=-40 Z=0
4 X=0 Y=0 Z=0
5 X=0 Y=40 Z=0
6 X=0 Y=80 Z=0
7 X=0 Y=132.644 Z=0
等10,000个关节.
etc. for 10,000 joints.
我想编写一个脚本,该脚本读取此文本文件并输出4个文本文件,这些文件为1列,每个列包含关节编号,x坐标,y坐标和z坐标.
I would like to write a script that reads this text file and outputs 4 text files that are 1 column each containing the joint number, x coordinate, y coordinate, and z coordinate.
这可能吗?我是python的新手,并尝试过类似的方法,但是Python不知道如何处理文本文件中的System,而且我确定我的方法不正确:
Is this possible? I am new to python and tried something like this but Python doesn't know what to do with System in the text file and I'm sure my method isn't correct:
os.chdir('/Users/DevEnv/')
with open('RawDataFile_445.txt') as a:
for line in a.readlines():
j=[]
data=line.strip()
data1=data.split(" ")
for i in range(0,len(data1)):
j.append(eval(data1[i]))
joint=j
推荐答案
您的解决方案似乎很脆弱:
Your solution seemed fragile:
- 您必须过滤标题,非数据行
- 您使用
eval()
确实过大且不安全. - 使用
readlines()
可能会占用大量内存:读取内存中的所有数据,您无需这样做.一次只读一行. - 您错过了写回输出文件的代码
- you have to filter headers, non-data lines
- you use
eval()
which is really overkill and unsafe. - using
readlines()
can be very memory-hungry: reads all data in memory, you don't need to do that. Just read one line at a time. - you miss the code to write back to the output files
我的解决方案使用正则表达式一次性提取所有数据.代码中的注释:
My solution uses regular expressions to extract all data in one go. Comments in the code:
import re
# regex to extract data line
r = re.compile(r"\s*(\d+)\s+X=(\S+)\s+Y=(\S+)\s+Z=(\S+)")
with open('RawDataFile_445.txt') as a:
# open all 4 files with a meaningful name
files=[open("file_{}.txt".format(x),"w") for x in ["J","X","Y","Z"]]
for line in a:
m = r.match(line)
if m:
# line matches: write in all 4 files (using zip to avoid doing
# it one by one)
for f,v in zip(files,m.groups()):
f.write(v+"\n")
# close all output files now that it's done
for f in files:
f.close()
您可以通过将with open(...) as a:
位替换为:
you can test it by replacing the with open(...) as a:
bit by:
a="""SYSTEM
DOF=UY,UZ,RX LENGTH=FT FORCE=Kip
JOINT
1 X=0 Y=-132.644 Z=0
2 X=0 Y=-80 Z=0
3 X=0 Y=-40 Z=0
4 X=0 Y=0 Z=0
5 X=0 Y=40 Z=0
6 X=0 Y=80 Z=0
7 X=0 Y=132.644 Z=0""".splitlines().__iter__()
模拟输入文件行(这是我回答输入文件问题的方法,以避免在系统上创建输入文件).您会看到这4个文件已创建并填满(不要忘了代码结尾的close
部分)
emulating the input file lines (that's how I do to answer questions with input files, to avoid creating input files on my system). You'll see the 4 files are created and filled up (don't forget the close
part at the end of the code)
这篇关于根据其内容拆分文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!