计算文件中的行数而不将整个文件读入内存? [英] Count the number of lines in a file without reading entire file into memory?

查看:32
本文介绍了计算文件中的行数而不将整个文件读入内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理巨大的数据文件(每个文件有数百万行).

I'm processing huge data files (millions of lines each).

在开始处理之前,我想对文件中的行数进行计数,以便我可以指出处理的进度.

Before I start processing I'd like to get a count of the number of lines in the file, so I can then indicate how far along the processing is.

由于文件的大小,将整个文件读入内存是不切实际的,仅仅计算有多少行.有没有人对如何做到这一点有好的建议?

Because of the size of the files, it would not be practical to read the entire file into memory, just to count how many lines there are. Does anyone have a good suggestion on how to do this?

推荐答案

如果你在 Unix 环境中,你可以让 wc -l 来完成工作.

If you are in a Unix environment, you can just let wc -l do the work.

它不会将整个文件加载到内存中;由于它针对流式文件进行了优化并计算字/行,因此性能足够好,而不是自己在 Ruby 中流式传输文件.

It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby.

SSCCE:

filename = 'a_file/somewhere.txt'
line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i
p line_count

或者,如果您想要在命令行上传递的文件集合:

Or if you want a collection of files passed on the command line:

wc_output = `wc -l "#{ARGV.join('" "')}"`
line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i
p line_count

这篇关于计算文件中的行数而不将整个文件读入内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆