修改文本文件而不读入内存 [英] Modifying a text file without reading into memory
问题描述
我试图找出一种修改文本文件(特别是删除特定行)的方法,而无需将文件的大部分读入内存或重写整个文件.这里谈论的是大于主存的文件,大约15-50 Gig.
I was trying to figure out a way to modify a text file (specially deleting specific lines) without reading a big part of file into memory or rewriting the whole file. Here am talking about files larger than main memory about 15-50 Gigs.
P.S.我正在使用Linux.
P.S. I am using Linux.
推荐答案
您将无法制作新文件,因此只需硬着头皮做就可以了.将 grep
与适当的选项一起使用,并将结果通过管道传输到第二个文件:
You aren't going to get around making a new file, so just bite the bullet and do it. Use grep
with appropriate options and pipe the result to a second file:
$ grep -fv patternsToExcludeFromInput input > output
另一种方法是将模式放入哈希表(Perl),字典(Python)或 unordered_map
(C ++)中,然后将输入文件的每一行处理为寻找比赛.
Another approach is to put patterns into, as examples, a hash table (Perl), a dictionary (Python), or an unordered_map
(C++), and process each line of your input file to look for matches.
如果没有匹配项,则将该行打印到标准输出流(可以通过管道将其输出到常规文件).您的内存使用情况将主要限于哈希表和您要查询的输入行.
If there is no match, print the line to the standard output stream (which you can pipe to a regular file). Your memory usage will be limited mostly to the hash table and the line of input you are querying.
这篇关于修改文本文件而不读入内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!