比较两个文本文件的最快方法是什么,不将移动的行计算为不同 [英] What is the fastest way to compare two text files, not counting moved lines as different
问题描述
我有两个文件非常大,每个文件50000行。我需要比较这两个文件并确定更改。然而,问题是如果一条线存在于不同的位置,它不应该显示为不同。
I have two files which are very large in size say 50000 lines each. I need to compare these two files and identify the changes. However, the catch is if a line is present at different position, it should not be shown as different.
例如,考虑这个
文件A.txt
For eg, consider this
File A.txt
xxxxx
yyyyy
zzzzz
文件B.txt
zzzzz
xxxx
yyyyy
因此,如果这是文件的内容。我的代码应该输出为xxxx(或者xxxx和xxxxx)。
So if this is the content of the file. My code should give the output as xxxx(or both xxxx and xxxxx).
当然最简单的方法是将文件的每一行存储在
Ofcourse the easiest way would be storing each line of the file in a
List< String>
并与其他人比较
List< String>.
但这似乎需要花费很多时间。我也尝试在java中使用DiffUtils。但它不会将不同行号中的行识别为相同。那么有没有其他算法可以帮助我?
But this seems to be taking a lot of time. I have also tried using the DiffUtils in java. But it doesnt recognize the lines present in diferent line numbers as same. So is there any other algorithm that might help me?
推荐答案
可能使用 Set
是最简单的方法:
probably using Set
is the easiest way:
Set<String> set1 = new HashSet<String>(FileUtils.readLines(file1));
Set<String> set2 = new HashSet<String>(FileUtils.readLines(file2));
Set<String> similars = new HashSet<String>(set1);
similars.retainAll(set2);
set1.removeAll(similars); //now set1 contains distinct lines in file1
set2.removeAll(similars); //now set2 contains distinct lines in file2
System.out.println(set1); //prints distinct lines in file1;
System.out.println(set2); //prints distinct lines in file2
这篇关于比较两个文本文件的最快方法是什么,不将移动的行计算为不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!