有效地计算文本文件的列数 [英] Effeciently counting number of columns of text file
问题描述
我有一些大的制表符分隔的文本文件,其格式类似于:
a 0.0694892 0 0.0118814 0 -0.0275522
b 0.0227414 -0.0608639 0.0811518 -0.15216 0.111584
c 0 0.0146492 -0.103492 0.0827939 0.00631915
<要计算我总是使用的列数:
>>> import numpy as np
>>>形状[1]
6
然而,对于更大的文件,这种方法显然效率不高,因为整个文件内容在获取形状
之前被加载到数组中。有没有一个简单的方法,这是更高效?
如果你想确保你使用完全相同的格式作为NumPy,最简单的解决方案就是在第一行中提供一个包装器。
如果您查看 loadtxt
, fname
参数可以是:
要读取的文件,文件名或生成器
事实上,它甚至不一定是一个生成器;任何迭代工作正常。就像一个清单。所以:
与open('file.txt','rb')作为f:
lines = [f .readline()]
np.loadtxt(lines,dtype ='str')。shape [1]
换句话说,我们只读了第一行,把它放在一个元素列表中,然后将它传递给 loadtxt
,并且将其解析为是一行文件。
I have a bunch of large tab-delimited text files, with a format similar to:
a 0.0694892 0 0.0118814 0 -0.0275522
b 0.0227414 -0.0608639 0.0811518 -0.15216 0.111584
c 0 0.0146492 -0.103492 0.0827939 0.00631915
To count the number of columns I have always used:
>>> import numpy as np
>>> np.loadtxt('file.txt', dtype='str').shape[1]
6
However, this method is obviously not efficient for bigger files, as the entire file content is loaded into the array before getting the shape
. Is there a simple method, which is more efficient?
If you want to make sure you're using the exact same format as NumPy, the simplest solution is to feed it a wrapper around the first line.
If you look at the docs for loadtxt
, the fname
parameter can be:
File, filename, or generator to read.
In fact, it doesn't even really have to be a generator; any iterable works fine. Like, say, a list. So:
with open('file.txt', 'rb') as f:
lines = [f.readline()]
np.loadtxt(lines, dtype='str').shape[1]
In other words, we just read the first line, stick it in a one-element list, and pass that to loadtxt
and it parses it as if it were a one-line file.
这篇关于有效地计算文本文件的列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!