根据其内容拆分文本文件 [英] Split up a text file based on its contents

查看:123
本文介绍了根据其内容拆分文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,如下所示:

I have a text file that looks like this:

SYSTEM

DOF=UY,UZ,RX  LENGTH=FT  FORCE=Kip

JOINT
  1  X=0  Y=-132.644  Z=0
  2  X=0  Y=-80  Z=0
  3  X=0  Y=-40  Z=0
  4  X=0  Y=0  Z=0
  5  X=0  Y=40  Z=0
  6  X=0  Y=80  Z=0
  7  X=0  Y=132.644  Z=0

等10,000个关节.

etc. for 10,000 joints.

我想编写一个脚本,该脚本读取此文本文件并输出4个文本文件,这些文件为1列,每个列包含关节编号,x坐标,y坐标和z坐标.

I would like to write a script that reads this text file and outputs 4 text files that are 1 column each containing the joint number, x coordinate, y coordinate, and z coordinate.

这可能吗?我是python的新手,并尝试过类似的方法,但是Python不知道如何处理文本文件中的System,而且我确定我的方法不正确:

Is this possible? I am new to python and tried something like this but Python doesn't know what to do with System in the text file and I'm sure my method isn't correct:

os.chdir('/Users/DevEnv/')

with open('RawDataFile_445.txt') as a:
    for line in a.readlines():
        j=[]
        data=line.strip()
        data1=data.split(" ")
        for i in range(0,len(data1)):
            j.append(eval(data1[i]))
        joint=j

推荐答案

您的解决方案似乎很脆弱:

Your solution seemed fragile:

  • 您必须过滤标题,非数据行
  • 您使用eval()确实过大且不安全.
  • 使用readlines()可能会占用大量内存:读取内存中的所有数据,您无需这样做.一次只读一行.
  • 您错过了写回输出文件的代码
  • you have to filter headers, non-data lines
  • you use eval() which is really overkill and unsafe.
  • using readlines() can be very memory-hungry: reads all data in memory, you don't need to do that. Just read one line at a time.
  • you miss the code to write back to the output files

我的解决方案使用正则表达式一次性提取所有数据.代码中的注释:

My solution uses regular expressions to extract all data in one go. Comments in the code:

import re

# regex to extract data line    
r = re.compile(r"\s*(\d+)\s+X=(\S+)\s+Y=(\S+)\s+Z=(\S+)")

with open('RawDataFile_445.txt') as a:

    # open all 4 files with a meaningful name
    files=[open("file_{}.txt".format(x),"w") for x in ["J","X","Y","Z"]]
    for line in a:
        m = r.match(line)
        if m:
            # line matches: write in all 4 files (using zip to avoid doing
            # it one by one)
            for f,v in zip(files,m.groups()):
                f.write(v+"\n")

    # close all output files now that it's done
    for f in files:
        f.close()

您可以通过将with open(...) as a:位替换为:

you can test it by replacing the with open(...) as a: bit by:

a="""SYSTEM

DOF=UY,UZ,RX  LENGTH=FT  FORCE=Kip

JOINT
  1  X=0  Y=-132.644  Z=0
  2  X=0  Y=-80  Z=0
  3  X=0  Y=-40  Z=0
  4  X=0  Y=0  Z=0
  5  X=0  Y=40  Z=0
  6  X=0  Y=80  Z=0
  7  X=0  Y=132.644  Z=0""".splitlines().__iter__()

模拟输入文件行(这是我回答输入文件问题的方法,以避免在系统上创建输入文件).您会看到这4个文件已创建并填满(不要忘了代码结尾的close部分)

emulating the input file lines (that's how I do to answer questions with input files, to avoid creating input files on my system). You'll see the 4 files are created and filled up (don't forget the close part at the end of the code)

这篇关于根据其内容拆分文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆