Python:如何读取列数不均匀的数据文件 [英] Python: How to read a data file with uneven number of columns
问题描述
另外每个〜30行只有4列。这是因为一些上游程序正在将一个200 x 280阵列重塑为7x8120阵列。
我的问题是:我们如何读取数据到一个8x7000阵列。当列数不一致时,我通常的np.loadtxt和np.genfromtxt库失败。
请注意,性能是一个因素,因为必须完成为〜18000个数据文件。$ b 这是一个典型的数据文件的链接:
http://users-phys.au.dk/hha07/hk_L1.ref
)作为f:
data = numpy.array(f.read().split(),dtype = float).reshape(7000,8)
首先将数据读取为一维数组,然后完全忽略所有换行符,然后将其重塑为所需的形状。
$ b
虽然我认为这个任务会被I / O限制,但是这个方法在处理时间上应该很少使用。
A friend of mine needs to to read a lot of data (about 18000 data sets) that is all formatted annoyingly. Specifically the data is supposed to be 8 columns and ~ 8000 rows of data, but instead the data is delivered as columns of 7 with the last entry spilling into the first column of the next row.
In addition every ~30 rows there is only 4 columns. This is because some upstream program is reshaping a 200 x 280 array into the 7x8120 array.
My question is this: How can we read the data into a 8x7000 array. My usual arsenal of np.loadtxt and np.genfromtxt fail when there is an uneven number of columns.
Keep in mind that performance is a factor since this has to be done for ~18000 datafiles.
Here is a link to a typical data file: http://users-phys.au.dk/hha07/hk_L1.ref
An even easier approach I just thought of:
with open("hk_L1.ref") as f:
data = numpy.array(f.read().split(), dtype=float).reshape(7000, 8)
This reads the data as a one-dimensional array first, completely ignoring all new-line characters, and then we reshape it to the desired shape.
While I think that the task will be I/O-bound anyway, this approach should use little processor time if it matters.
这篇关于Python:如何读取列数不均匀的数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!