如何从大文本文件中提取特定的数据行 [英] How to extract specific lines of data from a big text file
问题描述
我有一个包含以下内容的文本文件:
I have a text file that contains something like this:
# Comment
# Comment
# Comment
# Comment
# Comment
Comment
# Comment
**#Raw SIFs at Crack Propagation Step: 0**
# Vertex, X, Y, Z, K_I, K_II,
0 , 2.100000e+00 , 2.000000e+00 , -1.000000e-04 , 0.000000e+00 , 0.000000e+00 ,
1 , 2.100000e+00 , 2.000000e+00 , 1.699733e-01 , 8.727065e+00 , -8.696262e-04 ,
2 , 2.100000e+00 , 2.000000e+00 , 3.367067e-01 , 8.907810e+00 , -2.548819e-04 ,
**# MLS SIFs at Crack Propagation Step: 0**
# MLS approximation:
# Sample, t, NA, NA, K_I, K_II,
# Crack front stretch: 0
0 , 0.000000e+00 , 0.000000e+00 , 0.000000e+00 , 8.446880e+00 , -1.360875e-03 ,
1 , 5.670333e-02 , 0.000000e+00 , 0.000000e+00 , 8.554168e+00 , -1.156931e-03 ,
2 , 1.134067e-01 , 0.000000e+00 , 0.000000e+00 , 8.648241e+00 , -9.755573e-04 ,
# more comments
more comments
# more comments
**# Raw SIFs at Crack Propagation Step: 1**
# Vertex, X, Y, Z, K_I, K_II,
0 , 2.186139e+00 , 2.000000e+00 , -1.688418e-03 , 0.000000e+00 , 0.000000e+00 ,
1 , 2.192003e+00 , 2.000000e+00 , 1.646902e-01 , 9.571022e+00 , 4.770358e-03 ,
2 , 2.196234e+00 , 2.000000e+00 , 3.319183e-01 , 9.693934e+00 , -9.634989e-03 ,
**# MLS SIFs at Crack Propagation Step: 1**
# MLS approximation:
# Sample, t, NA, NA, K_I, K_II,
# Crack front stretch: 0
0 , 0.000000e+00 , 0.000000e+00 , 0.000000e+00 , 9.402031e+00 , 2.097959e-02 ,
1 , 5.546786e-02 , 0.000000e+00 , 0.000000e+00 , 9.467541e+00 , 1.443546e-02 ,
2 , 1.109357e-01 , 0.000000e+00 , 0.000000e+00 , 9.525021e+00 , 8.554051e-03 ,
如您所见,没有 # 符号的行包含我想要绘制的数据.我只向您展示了第 0 步和第 1 步的一小部分,但该文件中有大约 20 个步骤.并且在每个步骤中,都有两种类型的数据:RAW SIFS 和 MLS SIFS.对于每一部分数据,我想绘制一个折线图:顶点(第 1 列)与 K_I(第 5 列)以及顶点(第 1 列)与 K_II(第 6 列)的对比
As you can see, the lines without the # symbol contains the data I would like to plot. I've only shown you a short portion of step 0 and step 1, but there are around 20 steps in this file. And in each step, there are two types of data: RAW SIFS and MLS SIFS. For each section of data, I would like to plot a line graph of: vertex (1st column) versus K_I (5th column), and vertex (1st column) versus K_II (6th column)
所以,最后我想要顶点与 K_I 的 RAW SIFS 的 20 个步骤,在一张图中有 20 条曲线.然后是顶点与 K_II 的 RAW SIFS 的 20 个步骤的另一个图表.同样,我希望 MLS SIFS 的 20 个步骤用于顶点 vs K_I,在一张图中有 20 条曲线.然后,另一个关于顶点与 K_II 的 MLS SIFS 的 20 个步骤的图表.
So, in the end I would like the 20 steps of RAW SIFS for vertex vs K_I with 20 curves all in one graph. Then, another graph of the 20 steps of RAW SIFS for vertex vs K_II. Similarly, I would like the 20 steps of MLS SIFS for vertex vs K_I with 20 curves all in one graph. Then, another graph of the 20 steps of MLS SIFS for vertex vs K_II.
到目前为止,我创建了一个单独的文本文件,其中只有原始文件的一部分.因此,对于Raw SIFs at Crack Propagation Step: 0 部分,我编写的代码使用 numpy.loadtxt() 读取文件:
So far, I created a seperate text file where I just have one section of the original file. So for the Raw SIFs at Crack Propagation Step: 0 section, the code I have written uses numpy.loadtxt() to read the file:
import numpy
with open("numfile.txt") as RawStep0:
Vertex, K_I, K_II = numpy.loadtxt(RawStep0, usecols = (0, 4, 5), dtype = float, delimiter=" , ",
skiprows = 2, unpack = True)
我的输出--->
顶点 =数组([ 0., 1., 2.])
Vertex = array([ 0., 1., 2.])
K_I =数组([0., 8.727065, 8.90781])
K_I = array([0. , 8.727065, 8.90781])
如何在不需要为每个部分创建单独文件的情况下为原始文件编写代码?如何跳过所有带有 # 符号的行并创建我需要绘制的数组?
How do I go about writing the code for the original file without the need to create seperate files for each section? How can I skip all those rows with the # symbol and create the arrays I need to plot?
推荐答案
尝试:
# helper function to parse a data block
def parse_SIF(lines):
SIF = []
while lines:
line = lines.pop(0).lstrip()
if line == '' or line.startswith('#'):
continue
if line.startswith('**#'):
lines.insert(0, line)
break
data = line.split(',')
# pick only columns 0, 4, 5 and
# convert to appropiate numeric format
# and append to list for current SIF and step
SIF.append([int(data[0]), float(data[4]), float(data[5])])
return SIF
# your global data structure - nested lists
raw = []
mls = []
# read whole file into one list - ok if your data is not large
with open('data') as fptr:
lines = fptr.readlines()
# global parse routine - call helper function to parse data blocks
while lines:
line = lines.pop(0)
if line.startswith('**#'):
if line.find('Raw SIFs at Crack Propagation Step:') > -1:
raw.append(parse_SIF(lines))
if line.find('MLS SIFs at Crack Propagation Step:') > -1:
mls.append(parse_SIF(lines))
# show results for your example data
from pprint import pprint
for raw_step, mls_step in zip(raw, mls):
print 'raw:'
pprint(raw_step)
print 'mls:'
pprint(mls_step)
产生:
raw:
[[0, 0.0, 0.0], [1, 8.727065, -0.0008696262], [2, 8.90781, -0.0002548819]]
mls:
[[0, 8.44688, -0.001360875],
[1, 8.554168, -0.001156931],
[2, 8.648241, -0.0009755573]]
raw:
[[0, 0.0, 0.0], [1, 9.571022, 0.004770358], [2, 9.693934, -0.009634989]]
mls:
[[0, 9.402031, 0.02097959],
[1, 9.467541, 0.01443546],
[2, 9.525021, 0.008554051]]
这篇关于如何从大文本文件中提取特定的数据行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!