如何解析从文本文件数据块到了一个Python二维数组? [英] How to parse block data from a text file into an 2D array in Python?
问题描述
我试图解析为以下结构的文本文件:
I am trying to parse a text file with the following structure:
latitude 5.0000
number_of_data_values 9
0.1 0.2 0.3 0.4
1.1 1.2 1.3 1.4
8.1
latitude 4.3000
number_of_data_values 9
0.1 0.2 0.3 0.4
1.1 1.2 1.3 1.4
8.1
latitude 4.0000
number_of_data_values 9
0.1 0.2 0.3 0.4
1.1 1.2 1.3 1.4
8.1
...
每一个不同的纬度
号码是不同的阵列线。 number_of_data_values
是colomns数(一致彻底的文件)。
Every different latitude
number is a different array line.
number_of_data_values
is the number of colomns (consistent thorough the file).
在这个例子中,我想读取文件和输出一个3×9二维数组类似如下:
For this example I would like to read the file and output a 3 by 9 two-dimensional array like the following:
array = [[0.1,0.2,0.3,0.4,1.1,1.2,1.3,1.4,8.1],
[0.1,0.2,0.3,0.4,1.1,1.2,1.3,1.4,8.1],
[0.1,0.2,0.3,0.4,1.1,1.2,1.3,1.4,8.1]]
我在尝试一下通过与环行迭代,但是我正在寻找一种更有效的方式做到这一点,因为我可以处理大量输入文件。
I had a try at it by iterating through the line with loops but I am looking for a more efficient way to do it as I may deal with voluminous input files.
推荐答案
一个行由行实现是相当容易理解的。假设你的北纬
总是在开始一个新行(这是不是你的例子给什么,但它可能是一个错字),你可以这样做:
A line-by-line implementation is rather easy and understandable. Assuming that your latitude
always start on a new line (which is not what your example give, but it might be a typo), you could do:
latitudes = []
counts = []
blocks = []
current_block = []
for line in test:
print line
if line.startswith("latitude"):
# New block: add the previous one to `blocks` and reset
blocks.append(current_block)
current_block = []
latitudes.append(float(line.split()[-1]))
elif line.startswith("number_of_data"):
# Just append the current count to the list
counts.append(int(line.split()[-1]))
else:
# Update the current block
current_block += [float(f) for f in line.strip().split()]
# Make sure to add the last block...
blocks.append(current_block)
# And to remove the first (empty) one
blocks.pop(0)
您可以知道检查所有块是否有适当的大小:
You can know check whether all your blocks have the proper size:
all(len(b)==c for (c,b) in zip(counts,blocks))
替代解决方案
如果你关心的循环,你可能要考虑你的查询文件的内存映射的版本。这个想法是找到线的位置开始纬度
。一旦你找到了,找到下一个,你有一个文本块:ZAP的前两行(以一开始北纬
和一个开始 number_of_data code>),结合其余的和过程。
If you're concerned about the loops, you may want to consider querying a memory-mapped version of your file. The idea is to find the positions of the lines starting with latitude
. Once you find one, find the next and you have a block of text: zap the first two lines (the one starting with latitude
and the one starting with number_of_data
), combine the remaining ones and process.
import mmap
with open("crap.txt", "r+b") as f:
# Create the mapper
mapper = mmap.mmap(f.fileno(), 0)
# Initialize your output variables
latitudes = []
blocks = []
# Find the beginning of the first block
position = mapper.find("latitude")
# `position` will be -1 if we can't find it
while (position >= 0):
# Move to the beginning of the block
mapper.seek(position)
# Read the first line
lat_line = mapper.readline().strip()
latitudes.append(lat_line.split()[-1])
# Read the second one
zap = mapper.readline()
# Where are we ?
start = mapper.tell()
# Where's the next block ?
position = mapper.find("latitude")
# Read the lines and combine them into a large string
current_block = mapper.read(position-start).replace("\n", " ")
# Transform the string into a list of floats and update the block
blocks.append(list(float(i) for i in current_block.split() if i))
这篇关于如何解析从文本文件数据块到了一个Python二维数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!