快速找到两个大文本文件之间的差异 [英] Quickly find differences between two large text files
问题描述
我有两个3GB的文本文件,每个文件有大约80万行。它们共享99.9%相同的行(文件A有60,000个唯一行,文件B有80,000个唯一行)。
I have two 3GB text files, each file has around 80 million lines. And they share 99.9% identical lines (file A has 60,000 unique lines, file B has 80,000 unique lines).
如何快速找到两个文件中的那些唯一行?有没有任何现成的命令行工具为此?我使用的是Python,但我想它不太可能找到一个有效的Pythonic方法来加载文件和比较。
How can I quickly find those unique lines in two files? Is there any ready-to-use command line tools for this? I'm using Python but I guess it's less possible to find a efficient Pythonic method to load the files and compare.
任何建议是赞赏。
推荐答案
如果顺序很重要,请尝试 comm
实用程序。如果顺序无关紧要, sort file1 file2 | uniq -u
。
If order matters, try the comm
utility. If order doesn't matter, sort file1 file2 | uniq -u
.
这篇关于快速找到两个大文本文件之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!