计算文件中的行数而不将整个文件读入内存? [英] Count the number of lines in a file without reading entire file into memory?

查看：32 发布时间：2021/7/11 19:30:00 ruby

本文介绍了计算文件中的行数而不将整个文件读入内存?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理巨大的数据文件(每个文件有数百万行).

I'm processing huge data files (millions of lines each).

在开始处理之前，我想对文件中的行数进行计数，以便我可以指出处理的进度.

Before I start processing I'd like to get a count of the number of lines in the file, so I can then indicate how far along the processing is.

由于文件的大小，将整个文件读入内存是不切实际的，仅仅计算有多少行.有没有人对如何做到这一点有好的建议?

Because of the size of the files, it would not be practical to read the entire file into memory, just to count how many lines there are. Does anyone have a good suggestion on how to do this?

推荐答案

如果你在 Unix 环境中，你可以让 wc -l 来完成工作.

If you are in a Unix environment, you can just let wc -l do the work.

它不会将整个文件加载到内存中；由于它针对流式文件进行了优化并计算字/行，因此性能足够好，而不是自己在 Ruby 中流式传输文件.

It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby.

SSCCE:

filename = 'a_file/somewhere.txt'
line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i
p line_count

或者，如果您想要在命令行上传递的文件集合:

Or if you want a collection of files passed on the command line:

wc_output = `wc -l "#{ARGV.join('" "')}"`
line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i
p line_count

这篇关于计算文件中的行数而不将整个文件读入内存?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算文件中的行数而不将整个文件读入内存? [英] Count the number of lines in a file without reading entire file into memory?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算文件中的行数而不将整个文件读入内存? [英] Count the number of lines in a file without reading entire file into memory?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭