有效地计算文本文件的列数 [英] Effeciently counting number of columns of text file

查看:176
本文介绍了有效地计算文本文件的列数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些大的制表符分隔的文本文件,其格式类似于:

  a 0.0694892 0 0.0118814 0 -0.0275522 
b 0.0227414 -0.0608639 0.0811518 -0.15216 0.111584
c 0 0.0146492 -0.103492 0.0827939 0.00631915



<要计算我总是使用的列数:

 >>> import numpy as np 
>>>形状[1]
6

然而,对于更大的文件,这种方法显然效率不高,因为整个文件内容在获取形状之前被加载到数组中。有没有一个简单的方法,这是更高效?

解决方案

如果你想确保你使用完全相同的格式作为NumPy,最简单的解决方案就是在第一行中提供一个包装器。

如果您查看 loadtxt fname 参数可以是:


要读取的文件,文件名或生成器


事实上,它甚至不一定是一个生成器;任何迭代工作正常。就像一个清单。所以:

 与open('file.txt','rb')作为f:
lines = [f .readline()]
np.loadtxt(lines,dtype ='str')。shape [1]

换句话说,我们只读了第一行,把它放在一个元素列表中,然后将它传递给 loadtxt ,并且将其解析为是一行文件。


I have a bunch of large tab-delimited text files, with a format similar to:

a   0.0694892   0   0.0118814   0   -0.0275522  
b   0.0227414   -0.0608639  0.0811518   -0.15216    0.111584    
c   0   0.0146492   -0.103492   0.0827939   0.00631915

To count the number of columns I have always used:

>>> import numpy as np
>>> np.loadtxt('file.txt', dtype='str').shape[1]
6

However, this method is obviously not efficient for bigger files, as the entire file content is loaded into the array before getting the shape. Is there a simple method, which is more efficient?

解决方案

If you want to make sure you're using the exact same format as NumPy, the simplest solution is to feed it a wrapper around the first line.

If you look at the docs for loadtxt, the fname parameter can be:

File, filename, or generator to read.

In fact, it doesn't even really have to be a generator; any iterable works fine. Like, say, a list. So:

 with open('file.txt', 'rb') as f:
     lines = [f.readline()]
 np.loadtxt(lines, dtype='str').shape[1]

In other words, we just read the first line, stick it in a one-element list, and pass that to loadtxt and it parses it as if it were a one-line file.

这篇关于有效地计算文本文件的列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆